`for i in $(seq 1 100); do curl -H "X-Forwarded-For: 191.100.20.$i" ...`
was dumping 100 distinct IPs into AttackerDetail's LEAKED IPs row,
drowning the rest of the ORIGIN section. The 100-IP wall is itself a
signal (WAF-bypass-list probing) that deserves a short badge, not a
flood.
Backend:
- get_attacker_ip_leaks gains `limit: int = 10` parameter — caller
only ever needs a sample, not the full set.
- New count_attacker_ip_leaks() returns the unbounded COUNT(*) via
one cheap SQL aggregate.
- Detail endpoint returns {ip_leaks: [first 10], ip_leaks_total: N}
so the UI can render a rotation badge independent of list length.
UI:
- New LeakedIPsRow component. First 5 distinct IPs rendered inline
with hover tooltips (unchanged). When > 5, a `+ N more` expand
button reveals the rest of the sample; when total exceeds the
10-row cap, a subtle `(+M beyond sample)` note appears.
- When total ≥ 20, a red `ROTATION · N` tag renders leading the
row with a tooltip explaining the semantic: "almost certainly
XFF-rotation / WAF-bypass probing, not a real attribution leak."
DB churn is deliberately not capped — 100k rows × ~500 B is tolerable.
If it becomes a problem we can add an ingester-side count-and-skip;
for now the UX fix is the whole story.
Added test_ip_leaks_total_reported_separately_from_list asserting
the endpoint shape matches what the UI consumes.
Every http_useragent bounty now carries a `category` label plus an
optional tool name and a signals list. The main analytic win is the
`nonstandard` bucket — UAs like "FUCKYOU/1.0" or custom one-off
scanner labels that don't match any known pattern, which today
silently blend into the generic fingerprint list.
Buckets (priority order):
- scanner: nmap, nuclei, sqlmap, gobuster, nikto, masscan, zgrab,
ffuf, wpscan, katana, burp, acunetix, nessus, openvas, arachni,
whatweb, wappalyzer, etc.
- cli: curl, wget, httpie, xh, fetch.
- library: python-requests, aiohttp, httpx, urllib, Go stdlib, Java,
okhttp, Apache HttpClient, axios, node-fetch, got, undici, PHP,
Guzzle, Ruby stdlib, Faraday, .NET, PostmanRuntime, Insomnia, etc.
- bot: anything containing bot / crawler / spider / slurp / monitor
(catches Googlebot, bingbot, Baiduspider — many of which ship a
Mozilla/5.0 prefix, so the bot check runs BEFORE the browser
regex).
- browser: Mozilla/5.0-prefixed UAs that aren't bots.
- nonstandard: anything else. The interesting bucket.
- empty: literal empty User-Agent header.
Side signals computed regardless of category: suspicious_short (<8
chars), suspicious_long (>512 chars), nonprintable (control chars),
injection_like (SQLi / XSS / path-traversal / Log4Shell markers).
A sqlmap UA with a literal SQL-injection payload embedded fires
category=scanner + injection_like — the combination tells the
analyst the tool is being operated manually vs. on default config.
Classification is deterministic (same UA string → same tuple) so
add_bounty's payload-hash dedup continues to collapse repeat rows.
UI renderer upgraded from FpGeneric to a dedicated FpUserAgent that
colours the category tag by risk (scanner=alert-red,
nonstandard=warn-yellow, browser=accent-green, etc.) and renders
each signal as its own chip. Makes the interesting rows pop in the
fingerprints panel.
Also fixed: the ingester was using `_headers.get("User-Agent") or
_headers.get("user-agent")`, which short-circuits away empty-string
UAs. An explicit empty UA is itself a signal (real clients always
send something) — now captured.
An attacker hitting /admin with `X-Forwarded-For: 127.0.0.1` was
previously flagged as an IP leak. It isn't — that's the classic
IP-allowlist / WAF-bypass payload ("treat me as localhost and skip
your auth checks"). Misclassifying it as "LEAKED IPs" in the UI
confuses analysts and burns trust in the signal.
Split by claim category. After pulling the left-most claimed IP
from the proxy header, classify:
- public (routable) → bounty_type=ip_leak (real attribution leak;
the attacker's upstream proxy forwarded their real IP).
- loopback / private / link-local / multicast / reserved /
unspecified → bounty_type=fingerprint, fingerprint_type=
spoofed_source (WAF-bypass / allowlist-probing attempt; the
attacker is telling us they know what XFF does).
- unparseable → dropped.
Same extraction pipeline; diverges only at the last step. A new
shared _classify_proxy_header_claim returns (kind, payload);
_detect_ip_leak keeps its public-only contract for backward-
compat; _detect_spoofed_source is the new sibling.
UI renderer FpSpoofedSource shows the claimed IP in warn color with
the claim_category tag (LOOPBACK / PRIVATE / ...) and a WAF-BYPASS
ATTEMPT badge — distinct visual from the "LEAKED IPs" row which
stays reserved for genuine public-IP leaks.
Test addresses updated: RFC 5737 doc ranges (198.51.100.0/24,
203.0.113.0/24) are flagged `is_reserved` in Python's ipaddress
module, so they now correctly belong to the spoof bucket — tests
that meant to exercise real public IPs now use 8.8.8.8 / 1.1.1.1 /
Cloudflare DNS. Added eleven new tests locking the classifier +
the two detectors' mutual exclusion.
add_bounty dedups on (attacker_ip, bounty_type, full payload JSON).
Three fingerprint-family bounties (http_useragent, ip_leak,
http_quirks) were including method/path / header_count in their
payloads — fields that vary per request — so a scanner hitting 100
paths produced 100 rows instead of 1, which is what was swelling
AttackerDetail.
Payloads now carry identity-only fields:
- http_useragent: {fingerprint_type, value}. UA + path combinations
no longer collide; one row per distinct User-Agent string.
- ip_leak: {source_ip, real_ip_claim, source_header, headers_seen}.
One row per distinct (proxy source, leaked IP, leaking header)
triple; repeat hits with the same header on different paths dedup.
- http_quirks: {fingerprint_type, order_hash, order, casing_hash,
casing_category, stable_count, tool_guess}. No more header_count
(included volatile headers; Cookie-presence variance broke dedup).
Per-request context (path, method, etc.) was never load-bearing for
analysts — the logs table already answers "when + where" at
per-event resolution. The bounty table is for stable identity.
UI:
- FpHttpQuirks renderer drops the method/path footer line and the
header_count/duplicates tags; shows stable_count instead.
- LEAKED-IPs tooltip on AttackerDetail swaps "X on GET /path" for
"Leaked via X; source 203.0.113.42" — same information, stable.
Tests add a "payload stable across paths and methods" assertion on
http_quirks — locks the contract so a future regression that sneaks
a per-request field back in fails loudly.
Existing duplicate bounty rows don't retroactively collapse.
Dev: `decnet db-reset --i-know-what-im-doing drop-tables` and
restart. Prod: one SQL pass to dedup by (attacker_ip, bounty_type,
payload) — trivial but not automated.
Per-request HTTP fingerprint derived from the header dict we already
log. Captures:
- order_hash: SHA-256 prefix (16 hex) over the lowercased header-name
sequence, minus volatile/per-request headers (Content-Length,
Cookie, Authorization, XFF family, trace IDs). Stable identity for
a given client stack regardless of which target / path is hit.
- casing_hash: same shape but over the per-header casing category
(Title-Case / lower / UPPER / mixed). Attackers frequently spoof
User-Agent but forget their stack sends `user-agent` while browsers
send `User-Agent`.
- tool_guess: prefix match against curl / python-requests /
Go-http-client / nmap-nse signatures. Cheap, best-effort — the
hash is the hard signal.
- duplicates: reserved for when the HTTP template switches from
dict(request.headers) to a list form; today it always fires empty
because dict() collapses duplicates.
Payload is a fingerprint bounty (bounty_type="fingerprint",
fingerprint_type="http_quirks"). Bounty dedup collapses identical
hashes per attacker — one row per distinct fingerprint — so a chatty
scanner doesn't spam the vault, but a tool-chain change from the
same IP surfaces as a new row.
UI renderer (FpHttpQuirks) shows the two hashes, tool guess badge in
violet, casing/count tags, and a collapsible header-order list.
Added to the passiveTypes group so it nests with JA3/JA4L/etc. in
the AttackerDetail fingerprints panel.
One library note: the naive "title-case" classifier failed on tokens
like `X-Forwarded-For` because Python's "".islower() returns False
so `p[1:].islower()` rejects single-letter tokens like the `X`.
Fix: explicitly accept single-char tokens when uppercase.
Attackers routinely front their scanners with VPNs/proxies, so the
TCP source we log is the proxy egress, not the real host. But a
surprising number of attacker setups are misconfigured: the proxy
forwards the real IP in an X-Forwarded-For (or Forwarded / X-Real-IP
/ CDN-variant) header. From our side that's a free attribution leak.
New _detect_ip_leak extractor in decnet/web/ingester.py fires at
ingest time per HTTP request. Logic:
1. Require service=http, source_ip present, headers present.
2. If source_ip ∈ DECNET_TRUSTED_PROXIES (comma-separated IPs or
CIDRs) → legitimate reverse-proxy forwarding, skip.
3. Walk proxy-family headers in priority order: Forwarded (RFC 7239)
→ X-Forwarded-For → X-Real-IP → True-Client-IP → CF-Connecting-IP.
4. Extract the left-most parseable IP from the winning header.
5. If that IP differs from the TCP source → emit a bounty with
bounty_type="ip_leak" carrying {source_ip, real_ip_claim,
source_header, headers_seen, path, method}.
Storage is the existing Bounty table — no schema change; de-dup is
handled by Bounty's (attacker_ip, bounty_type, payload_hash) key, so
repeat requests with the same leaked IP don't spam.
AttackerDetail renders a warn-accent "LEAKED IPs:" row under ORIGIN
listing distinct real_ip_claim values; hover tooltip shows the source
header + path of the most recent leak. Only shown when at least one
ip_leak bounty exists.
RFC 7239 Forwarded parser handles the full vocabulary — bare IPv4,
IPv4:port, quoted, IPv6 in brackets, IPv6 with port — returning only
IPs that actually parse.
Closes DEVELOPMENT.md "Network Topology Leakage → X-Forwarded-For
mismatches". Phase 3 of the three-phase Attacker Intelligence series
(phases 1: scanned-vs-interacted, 2: PTR records already shipped).
DECNET_TRUSTED_PROXIES env shape matches THREAT_MODEL DA-08's
"revisit when verified-proxy config lands" note — same token set
future rate-limit work will consume.
Resolve each attacker IP's rDNS name once at first sighting, store on
Attacker.ptr_record, render on AttackerDetail under ORIGIN. Many
attackers run infrastructure with forgotten rDNS that instantly
identifies them once surfaced: scan-node-42.shodan.io,
shady-vps.leasecloud.net, etc.
Resolver lives in decnet/geoip/ptr.py — colocated with enrich_ip
because the shape matches (take an IP, return supplementary
metadata, never raise). Uses the OS resolver via socket.gethostbyaddr
offloaded to the default executor, wrapped with asyncio.wait_for
timeout=2s so a slow authoritative NS can't stall the profiler tick.
Profiler side: _WorkerState grows a ptr_attempted: set[str] bounding
resolution to once per worker lifetime. Cold-start batches resolve
concurrently (Semaphore(_PTR_CONCURRENCY=10)) so a backlog doesn't
serialize 2s ceilings. _build_record gains a keyword-only ptr_record
parameter that, when _UNSET, omits the key from the record dict —
upsert_attacker's attribute-merge loop then preserves whatever's
stored on the row. Explicit None is a "fresh failed attempt" signal
and gets written through.
Env kill-switch DECNET_PTR_ENABLED=false for locked-down deploys
where egress DNS is forbidden. Private / loopback / link-local /
multicast / reserved addresses short-circuit before any DNS call.
IPv6 reverse DNS works transparently through the stdlib resolver.
Schema change — run once on upgrade:
ALTER TABLE attackers
ADD COLUMN ptr_record VARCHAR(256) NULL DEFAULT NULL;
Or drop-and-recreate on dev boxes (db-reset's SQLModel.metadata-driven
table discovery now picks it up automatically since ba155b7).
tests/conftest.py disables DECNET_PTR_ENABLED globally for the same
reason it disables DECNET_GEOIP_ENABLED — unit tests must never hit
the network. tests/geoip/test_ptr.py re-enables explicitly via an
autouse fixture.
Adds a new card on AttackerDetail: SCANNED · N services | INTERACTED
WITH · M services. Distinguishes port-scanners (N high, M=0) from
actual engagement (M>0) at a glance — the analyst's first question
when triaging a new attacker row.
Classifier lives in decnet/correlation/event_kinds.py, a single
source of truth for the event-type vocabulary:
- INTERACTION_EVENT_TYPES — command-family (command/exec/query/...),
SMTP engagement (mail_from/rcpt_to/message_accepted), file/payload
activity (file_captured/upload/download_attempt/retr), pub/sub
(publish/subscribe), recorded TTY sessions.
- NOISE_EVENT_TYPES — DECNET-internal (startup/shutdown/parse_error/
unknown_*).
- Everything else defaults to scan. Conservative by design: new
template verbs show up as "scanned" until explicitly promoted.
Bucket logic: a service is "interacted" if ≥1 of its events
classifies as interaction; otherwise "scanned" if ≥1 scan event;
noise-only services drop. Disjoint by construction.
Deliberate no-schema path: compute on-the-fly in the detail endpoint
via SELECT DISTINCT service, event_type FROM logs. Small result set
(tens of pairs per attacker), cost is trivial vs. the existing
behavior/commands queries. Trade-off: one more DB round-trip per
detail view in exchange for zero ALTER TABLE migration pain and
immediate classifier-change feedback loop.
Profiler's _COMMAND_EVENT_TYPES stays as-is (strict subset of
interactions that carry executable text), with a comment pointing at
the new canonical module.
Closes DEVELOPMENT.md "Attacker Intelligence §Service-Level Behavioral
Profiling — Services actively interacted with".
test_lifespan_db_retry patched decnet.web.api.asyncio.sleep to skip the
DB-retry backoff. Problem: asyncio is a shared module — the patch leaks
to every caller that looked up asyncio.sleep via `import asyncio`,
including run_health_heartbeat's own sleep loop. That heartbeat task
spawns inside the same lifespan; with its sleep mocked, the while-loop
spins tight, starves cancellation, and leaves an orphan task that
pytest-timeout eventually signals — surfacing as the 'Task exception
was never retrieved' warnings the user saw when running the suite.
Fix: give decnet.web.api a local binding `_retry_sleep = asyncio.sleep`
for the DB-retry wait, and have the test patch that instead. Narrowly
scoped, no impact on asyncio.sleep callers elsewhere.
Test timing before: 12s with --timeout=10 (interrupted by signal).
Test timing after: 0.58s. Full tests/web slice: 27s → 7.1s with the
spurious warnings gone.
Before: if the bus was unreachable at worker start, we logged
"running in idle mode" once and parked on shutdown forever. systemd
doesn't guarantee bus is fully up before the webhook worker starts,
so a race on boot left the worker permanently dead until restart.
Now: wrap the whole bus-use in an outer reconnect loop.
while not shutdown:
try: connect()
except: sleep(RECONNECT_SECS) ; continue
try: run_with_bus(...) # heartbeat + dispatch
except: log+close ; reconnect on next iter
Clean consequence: if the bus dies mid-operation the dispatch loop's
subscriptions raise inside the consumer tasks, `_run_with_bus` exits,
the outer loop closes the stale connection and reconnects. No partial
state leaks across epochs — fresh bus, fresh subs, fresh heartbeat.
Interval is 60s by default, overridable via
DECNET_WEBHOOK_BUS_RECONNECT_SECS. Shutdown wakes the wait so
systemctl stop doesn't hang for a minute.
Test added: flaky get_bus that fails once, then returns a live
FakeBus — asserts retry + successful delivery.
get_app_bus() in decnet/bus/app.py already has a 2s backoff retry so
the FastAPI hot path self-heals; this commit brings the standalone
webhook worker in line with the same posture.
Add "webhook" to KNOWN_WORKERS + the start-all preferred order so the
Config → Workers panel picks up the row automatically: heartbeat
subscription, start/stop controls via the existing systemd helper
(decnet-webhook.service.j2 already lands via decnet init's unit
glob), and the status-dot lifecycle all come for free.
Placed between mutator and the swarm-only agent/forwarder/updater
trio — matches the intended startup sequence (bus → api → data-plane
workers → egress → swarm management).
No frontend change needed; Config.tsx reads the worker list
dynamically from GET /api/v1/workers.
The hardcoded _DB_RESET_TABLES tuple had drifted — session_profile,
smtp_targets, and webhook_subscriptions were all missing, so
`decnet db-reset --i-know-what-im-doing drop-tables` silently left
them behind. Running it on a post-webhook install then letting
SQLModel.metadata.create_all() re-create tables produced a partial
schema: old rows survived, new columns didn't land, and endpoints
500'd on the missing columns (e.g. auto_disabled_at after the
circuit breaker merge).
Replace the hardcoded list with `SQLModel.metadata.sorted_tables`,
reversed for DROP safety (children first). Any future model addition
is auto-enrolled — no manual step, no more drift.
No behavior change on reset semantics; the SET FOREIGN_KEY_CHECKS=0
fence still covers any edge case the sort order misses.
After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed
deliveries, the worker calls trip_webhook_circuit(uuid, ts) which
flips enabled=False and stamps auto_disabled_at. The worker sets its
reload flag so the next dispatch epoch stops consuming events for the
tripped sub entirely — one dead receiver can't poison the shared
egress pool anymore.
Operator clears the trip via PATCH — setting enabled=True when the
sub was previously disabled clears auto_disabled_at, zeros
consecutive_failures, and clears last_error. Admin-pause → re-enable
hits the same path harmlessly.
Three observable states now distinguishable in the UI:
- Active enabled=True, auto_disabled_at=NULL
- Admin-paused enabled=False, auto_disabled_at=NULL
- Tripped enabled=False, auto_disabled_at=<ts>
UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and
a "N TRIPPED" count in the page header. Hover tooltip tells the
operator how to reset ("Re-enable via Edit").
record_webhook_failure now returns the new consecutive_failures count
so the worker can compare against the threshold without a second
roundtrip. trip_webhook_circuit is idempotent — re-tripping just
re-stamps auto_disabled_at.
Closes THREAT_MODEL WH-02 and DEBT-037 §1.
WebhookResponse now carries a `warnings: list[str]` field. When the
subscription's URL starts with http://, an `insecure_url` advisory is
surfaced on every GET/CREATE without blocking the request. HMAC still
detects tampering regardless of transport — only read-confidentiality
is lost over plaintext — and test/dev environments without TLS stay
usable.
Matches the operator-trust posture already established by DA-06
(admin-on-admin protection is out of scope). The alternative — hard
rejection at admin time — was considered and declined; warning-plus-
visibility is the right shape.
THREAT_MODEL WH-03 accepted risk registered; revisit triggers are
multi-admin delegation, a regulated customer, or an operator ticket
asking for a DECNET_WEBHOOK_REQUIRE_HTTPS enforcement knob.
Introduces the `decnet webhook` long-running worker that consumes the
internal bus and POSTs matching events to configured subscriptions.
Design: one task per (subscription, pattern) pair. Each task opens
its own bus subscription, iterates events, and dispatches via the
shared deliver() client. No intermediate queue, no in-memory filter
matching — the bus's own pattern matcher is the filter. Reloads on
`system.webhook.subscriptions_changed` signals from the CRUD router,
with a 60s fallback timer in case a signal is lost.
Shutdown propagates via CancelledError on the outer task; all inner
subscription tasks are cancelled and awaited in a finally block.
Bus unavailable → worker stays up in idle mode per the DEBT-031
pattern, logging one warning.
Registered as a master-only CLI command (agents don't configure
webhooks — the subscription store lives on master). systemd unit
mirrors the profiler template; added to decnet.target Wants= list so
`systemctl start decnet.target` brings it up alongside everything
else. `decnet init` auto-picks up the new .service.j2 via its
existing `glob("decnet-*.service.j2")` sweep.
Introduces the webhook egress foundation — a new WebhookSubscription
table, admin-gated CRUD under /api/v1/webhooks, and the shared
delivery client that both the test-ping route and the upcoming worker
will use. No worker yet; this commit is API + model + client only.
Simple-mode enum (AttackerDetail / DeckyStatus / SystemStatus) expands
to bus-topic patterns at the router layer; storage is always the raw
pattern list. Advanced mode lets admins supply raw NATS-style patterns
directly. Filter-at-subscribe: the worker (next commit) will subscribe
to the union of patterns across enabled subscriptions.
Delivery client handles HMAC-SHA256 signing (X-DECNET-Signature),
retry on 429/5xx/network errors with jittered backoff, no-retry on
4xx. Secrets never leave the server on GET/LIST — only the create
response carries the secret for copy-out.
CRUD routes publish WEBHOOK_SUBSCRIPTIONS_CHANGED on the bus after
every mutation so the (future) worker can hot-reload.
Opens DEBT-037 for the deferred items (circuit breaker, dead-letter,
batch delivery, payload templates, secret-at-rest).
New decnet/web/sse_limits.py provides sse_connection_slot, an async
context manager that counts live SSE connections per user UUID and
raises 429 when a per-user cap is exceeded (default 5, override via
DECNET_SSE_MAX_PER_USER). Wired into both SSE generators as their
first async with, so the cap check fires before any stream data is
yielded.
The cap must sit inside the generator — StreamingResponse returns
before the generator body runs, so a handler-level wrapper would
release the slot immediately. Put prefetch + slot + loop all under
the one async with.
Also documents F6/I (role leakage) as mitigated-by-construction via
handler docstrings: every event type on both streams wraps data
already reachable via viewer-gated REST, so no per-event filter is
needed until a new event family is introduced. The invariant is
written into the handler docstrings so a future PR can't silently
add admin-only events.
Resolves THREAT_MODEL F6/I and F6/D.
Every mutation route that returned an untyped dict now declares
response_model at the decorator. MessageResponse covers the eight
{"message": ...} envelopes (change-password, mutate-decky, mutate-
interval, update-deployment-limit, update-global-mutation-interval,
delete-user, update-user-role, reset-user-password). Purpose-built
models cover the richer shapes (DeployResponse for /deckies/deploy,
PurgeResponse for /config/reinit, ReapReportResponse for /reap-orphans,
UserResponse for /config/users). 204-No-Content and Response/
ORJSONResponse routes stay as-is.
The wire shape for clients is unchanged — the envelopes already only
shipped a message field. What changes is that a handler which
accidentally returns a richer dict (e.g. a full user row including
password_hash) would be silently stripped to the declared fields at
serialization time.
Also flips F4/D "expensive LIKE" to accepted (new DA-09) — the /logs
and /attackers search routes LIKE-scan unbounded columns, but both are
admin-gated, limit-capped, and operator rate-limit scope per DA-04.
FTS5 stays a performance TODO, not a security blocker.
The other five query endpoints (/logs, /attackers, /attacker-commands,
/bounties, /topologies/{id}) already declared le=2147483647 on offset;
these two were inconsistently uncapped. Bring them in line to close
the F4/D deep-pagination row.
Also resolves F4/T (ORM sort injection — already mitigated by the
regex pattern on /attackers sort_by, no other route accepts a column
name) and F4/D (limit cap — already universal) with code pointers.
Harden the attacker-controlled artifact download path (F7) with explicit
response headers instead of relying on Starlette's defaults (which only
emit attachment for non-ASCII filenames and never set nosniff). Also
resolves the THREAT_MODEL F7 path-traversal row (containment check was
already in _resolve_artifact_path) and the fleet-deploy detail=str(e)
audit (all four sites are admin-gated deliberate validator UX or
structured worker-response fields).
The ~30-signature hand-rolled p0f-lite table in decnet/sniffer/p0f.py
misses most real-world attackers (yesterday's SLOW SCAN being a
textbook case — 9 hours of events, 19 hits, os_guess = NULL). The
375-sig vendored p0f v2 DB was already there; this commit actually
calls it.
New resolution chain in sniffer_rollup:
1. Enabled OS-fingerprint providers (p0f-v2 default, via
DECNET_OSFP_PROVIDERS) tried in declared order. Provider with
highest-confidence match across all enabled sources wins.
2. Modal os_guess label from the sniffer's hand-rolled p0f.py.
Kept as fallback because v2's DB predates post-2006 kernels.
3. TTL bucket (linux / windows / embedded). Coarse but never wrong.
Wiring details:
- _match_via_osfp_providers: never raises — factory / provider
failures collapse to None and the chain falls through to the
old modal-label / TTL path. A corrupt .fp file or misconfigured
DECNET_OSFP_PROVIDERS must never wedge a profile rebuild.
- tcp_fp_context tracks whether the LATEST tcp_fp snapshot came
from a passive SYN ('syn' → p0f.fp) or an active prober probe
('synack' → p0fa.fp). Routes to the right sig list.
- initial-TTL normalisation via decnet.sniffer.p0f.initial_ttl.
Observation's TTL may be N hops below the OS's initial; v2
signatures match on the canonical bucket.
Soft-field semantics on Signature.score(): df and total_len are now
skip-checked when the observation is missing them. Sniffer doesn't
currently emit either SD field; a literal-constraint sig
shouldn't hard-reject a match solely because of upstream
incompleteness. Hard fields (window, ttl, options_sig, quirks)
still hard-reject on absent/mismatched input — those are the real
discriminators. Promote df / total_len back to hard the moment the
sniffer starts emitting them.
+2 integration tests on TestSnifferRollup, +2 soft-field tests on
test_signature. Full regression: 166 tests across tests/prober/osfp
+ tests/profiler all green.
- decnet/prober/osfp/p0f/provider.py: P0fV2Provider loads the four
vendored .fp files into per-context signature lists (syn / synack /
rst / stray) and matches via highest-specificity score across the
relevant list. Also auto-picks up p0f-decnet.fp if present (GPL-3.0
additions land there later, empty for now).
- decnet/prober/osfp/factory.py: get_provider / get_all_providers /
reset_cache, mirrors decnet/geoip/factory exactly. Env-dispatched
via DECNET_OSFP_PROVIDERS (default "p0f-v2"). Reserved names
"nmap-osdb" (pending Fyodor's grant) and "decnet-observed" (our
future curated DB) raise NotImplementedError — visible on the
factory surface so a typo doesn't silently fall through.
- decnet/prober/osfp/__init__.py now re-exports the public API so
callers use `from decnet.prober.osfp import get_provider` without
reaching into submodules (upholds the provider-subpackage rule).
15 new provider+factory tests covering:
- All four DB contexts load (262/61/46/6 sigs per inventory).
- Known-good Linux 2.6 SYN + Linux 2.2 SYN-ACK match end-to-end.
- Unknown observations / contexts return None, not raise.
- Factory memoises, env override honoured, unsupported names raise.
- Reserved names raise NotImplementedError (not silent None).
`sniffer_rollup` wiring lands in the next commit.
First code layer of the OS-fingerprinting work on top of yesterday's
vendored p0f v2 database. Three new modules, all pure (no I/O outside
of the parser's file read):
- decnet/prober/osfp/base.py — Provider protocol + OsMatch dataclass
matching the established Provider convention in decnet/geoip and
decnet/bus. Docstring spells out the never-raise invariant: malformed
input returns None, so a single bad event can't wedge a whole
attacker-profile rebuild.
- decnet/prober/osfp/p0f/signature.py — Signature dataclass + three
predicate helpers (WindowSpec / IntSpec / OptionToken) encoding the
p0f v2 DSL's wildcard / modulo / MSS-multiple / MTU-multiple
semantics. Scoring is our extension on top of upstream p0f's
first-match-wins policy: each signature carries a precomputed
specificity in [0, 1] so the factory can pick the most-specific
match when multiple signatures fire against one observation.
- decnet/prober/osfp/p0f/format.py — .fp line parser. Every shipped
field variant from the DSL spec at the top of p0f.fp is covered
(Snn / Tnn / %nnn / * for window; T0 vs T; -/@/* os-genre prefixes;
quirks as concatenated single-letter flags; '.' sentinels for
no-options / no-quirks). Malformed lines log a warning and skip
instead of aborting the whole file — 1 bad row must not cost the
other 374.
20 parser tests + 14 scoring tests. Full vendored-DB smoke tests
confirm all 375 signatures parse round-trip (262 SYN + 61 SYN-ACK +
46 RST + 6 stray) and every computed specificity lands in [0, 1].
Ships the p0f v2.0.8 signature database for passive + active OS
fingerprinting. 375 total signatures across four probe contexts:
- p0f.fp (262 sigs) — passive SYN fingerprints
- p0fa.fp ( 61 sigs) — SYN-ACK response, for active probes
- p0fr.fp ( 46 sigs) — RST response quirks
- p0fo.fp ( 6 sigs) — "stray" packet fingerprints
Replaces reliance on the 10-signature hand-rolled p0f-lite table in
decnet/sniffer/p0f.py for any match job the upstream DB covers.
Keeping the hand-rolled table as a fallback for modern kernels the
v2 DB pre-dates — v2 froze in 2006 so post-Win10 / post-Linux-3.x
kernels won't match against upstream directly. DECNET-authored
additions will go in a sibling p0f-decnet.fp under GPLv3 (not yet
committed; added as the ingester observes real honeypot traffic).
Provenance (full chain in data/README.md):
- Source: Debian snapshot of p0f_2.0.8.orig.tar.gz
- SHA1 matches Debian-recorded 7b4d5b2f24af4b5a299979134bc7f6d7b1eaf875
- Files byte-identical to upstream tarball (verified by hash)
License chain:
- Upstream: LGPL-2.1 (doc/COPYING preserved verbatim as
data/LICENSE.p0f-upstream, Michal Zalewski's copyright intact).
- DECNET uses the LGPL-2.1 §3 explicit permission to convert to any
version of the GPL. These files, as consumed in DECNET, are
effectively GPL-3.0. Chain documented in data/README.md so an
auditor sees the full reasoning.
- LGPL-2.1 → GPL-3.0 §3 conversion is a settled compat path; same
mechanism the kernel uses for LGPL userland glue and many other
projects apply daily.
Rejected path — nmap-os-db under NPSL — because NPSL adds
restrictions GPLv3 §7 prohibits us from accepting. An email is out
to Fyodor requesting an open-source-author exception grant, but we
don't block on it: p0f v2 is a genuine accuracy improvement in
its own right, and adding nmap-osdb later (if granted) plugs into
the same provider interface with zero refactor.
Directory layout mirrors the established provider-subpackage pattern
(see decnet/geoip/, decnet/bus/) per the feedback_provider_
subpackages memory: base + factory + impl/ subpackages, no flat
files. Parser + matcher + factory wiring land in the next commit
sequence.
Follow-ups on 9232031 per review:
- Module-level constants KD_PAUSE_BURST_MAX_S (0.2s),
KD_PAUSE_THINK_MAX_S (1.5s), KD_START_OF_ACTION_IDLE_S (2.0s).
Docstrings reference them by name; future calibration against real
session data only has to touch one place. Threshold for "started
a new action" raised from 1s → 2s — 1s catches too much
mid-command hesitation to be empirically bimodal.
- New column kd_max_pause_gap (seconds). The distracted bucket count
alone can't distinguish one 3s pause from three 60s pauses;
max-gap carries that signal in one cheap scalar (vs widening the
histogram to a fourth bucket).
- Scope-framing docstring above the whole kd_* section: intended
use is session clustering / tooling attribution, explicitly NOT
biometric identity, admission decisions, or ML-driven user ID.
Keeps a future well-intentioned contributor from walking the
project into legal/ethics territory by accident.
- TODO comment on kd_top_bigrams: v1's JSON-in-TEXT is fine for
"show the top digraphs on the attacker page". If bigram-similarity
queries become hot, promote to a session_bigram_stats(sid, bigram,
count, mean_iat_s) table or Postgres JSONB + GIN. Neither changes
the write-side ingester materially.
No new migration helper — pre-v1 schema additions go through
create_all on fresh DBs; the existing _migrate_session_profile_table
stays but does not get extended. Alembic lands at v1 and sweeps all
the ad-hoc migrations at once.
Adds the three signal columns motivated by the manual keystroke
analysis in DEBT-036 directly to the SessionProfile table. Pre-v1 so
we modify the schema in place — Alembic arrives at v1.
Columns:
- kd_top_bigrams (TEXT) — JSON of top-N most-common digraphs with
mean IAT per bigram. Complements kd_digraph_simhash ("same typist?")
with "same typist in same mental state?" (tired / rested / distracted
shifts bigram-specific IATs measurably).
- kd_start_of_action_latency (REAL/DOUBLE) — median IAT of the first
keystroke after an idle gap > 1s. Separates "initiating a command"
from "executing a remembered one"; real humans have measurable
start-of-action latency, bots don't.
- kd_pause_hist_burst / _think / _distracted (INT) — three-bucket
histogram (counts, <0.2s / 0.2-1.5s / >1.5s). More discriminating
than the existing flat burst_ratio / think_ratio pair: C2 operators
concentrate in burst with a thin tail; opportunistic humans have a
fat think bucket and a long distracted tail.
Both backends get an idempotent ADD COLUMN migration
(_migrate_session_profile_table) wired into initialize() alongside
the existing _migrate_attackers_table path — guards on PRAGMA
table_info (SQLite) / information_schema.COLUMNS (MySQL) so reruns
are safe.
PII discipline comment on kd_digraph_simhash and kd_top_bigrams:
both operate on bigram CHARACTERS, never on raw input stream content.
Attacker passwords typed over SSH must not land here.
Test updated for the MySQL initialize() migration-order contract.
sessrec.c emits the session_recorded SD blob with sid/service/src_ip/
duration_s/bytes/truncated — it never emitted shard_path. The web
handler still asked for fields.shard_path, got "", tripped the
sessions-YYYY-MM-DD.jsonl basename regex and returned
400 "invalid shard name" for every legitimate transcript request.
Handler now:
- Fast-paths when fields.shard_path IS present and validates
(for any future emitter or ingester that backfills it).
- Otherwise enumerates sessions-YYYY-MM-DD.jsonl shards under
ARTIFACTS_ROOT/{decky}/{service}/transcripts/ (newest first) and
returns the first one whose per-sid index contains our sid.
- Security invariant preserved: only files whose basename matches the
_SHARD_BASENAME_RE are ever opened, and they always resolve inside
ARTIFACTS_ROOT. A forged fields.shard_path is silently ignored.
- Soft-fails OSError/PermissionError on the transcripts dir (decky
containers often write it with a uid the API can't read) — returns
404 instead of a 500 traceback.
test_forged_shard_path_blocked updated to match the new semantics:
forgery is ignored, the real shard is served via fallback. The
invariant (no /etc/passwd access) is still asserted by the fact
that status is 200 with data from the test shard.
polkit rule 50-decnet-workers.rules hardcoded isInGroup("decnet"),
so when 'decnet init --group anti' installed systemd units as
User=anti / Group=anti, the API (running as anti) could no longer
systemctl start/stop decnet-*.service — polkit fell back to
'interactive authentication required', which in a daemon context is
a hard fail:
START FAILED · COLLECTOR — Failed to start decnet-collector.service:
Access denied as the requested operation requires interactive
authentication.
Rename the rule to .j2, parameterise the group on {{ group }}, and
route _install_polkit through _render_template /
_write_rendered_if_changed. Now the polkit rule matches whatever
group was passed to 'decnet init'.
Test fixture updated to seed the .j2 variant.
_configure_logging opened InodeAwareRotatingFileHandler against
DECNET_SYSTEM_LOGS (default: relative decnet.system.log) without
guarding OSError. Under systemd with ProtectSystem=full +
ProtectHome=read-only and no writable path baked into the unit, the
first import of decnet.config raised OSError and the daemon died
before it could even print a useful error — the root-cause log line
showed up in journalctl as a stack trace rather than a warning.
Wrap the handler attachment in try/except OSError and log a single
WARNING via the already-installed stream handler. stderr is always
attached, so losing the file handler means operators tail
journalctl / docker logs instead — the daemon keeps running.
The API lifespan unconditionally spawned log_collector_worker,
appending every container line to DECNET_INGEST_LOG_FILE. On hosts
that also run decnet-collector.service (installed by 'decnet init')
that's two tailers writing the same events to the same file — the
ingester then inserts each event twice and the dashboard shows every
command duplicated.
Add DECNET_EMBED_COLLECTOR (default false), matching the existing
DECNET_EMBED_PROFILER and DECNET_EMBED_SNIFFER pattern directly
above this block. Single-process dev setups without systemd can flip
it on to restore the all-in-one behaviour; multi-process production
gets the single-writer invariant by default.
Every plain `decnet deinit` ran userdel + groupdel unconditionally. In
dev the operator may pass `--user $USER --group $USER` to avoid file
ownership churn against a source checkout — at which point deinit
would cheerfully delete their own login account.
Move user/group removal behind --purge, matching the existing
behaviour for /var/lib/decnet + /var/log/decnet. Help text updated:
--purge now clearly advertises that it also wipes the service
user/group, with an explicit warning to only run it when `decnet init`
created the account in the first place.
Test updated: plain --deinit must NOT invoke userdel/groupdel;
--deinit --purge must.
Every decnet-*.service.j2 hardcoded User=decnet / Group=decnet. The
init CLI accepted --user / --group and used them for useradd,
chown, /etc/decnet ownership and ReadWritePaths — but the Jinja
context omitted them entirely, so
sudo decnet init --install-dir $PWD --user anti --group anti
rendered
User=decnet
Group=decnet
into every unit, which at best ran the workers as a user that didn't
match the files (fails to read the venv / config), and at worst spun
a parallel system user the operator never asked for.
Swap the hardcoded lines to {{ user }} / {{ group }} across all 13
templates and add both to the Jinja context in _install_units.
The systemd unit templates hardcoded {{ install_dir }}/venv/bin/decnet.
On production hosts enroll_bootstrap.sh creates exactly that path so it
worked. On dev boxes where the operator runs `sudo decnet init` against
a source checkout with a differently-named venv (.venv, .311, .312),
every decnet-*.service looped forever in auto-restart with:
Failed at step EXEC spawning .../venv/bin/decnet: No such file or
directory
Templates now use {{ venv_dir }} as an independent Jinja2 var. `decnet
init` adds --venv-dir (explicit override), otherwise autodetects:
1. $VIRTUAL_ENV (only when inside --install-dir, so a user-home venv
never gets baked into a root-owned unit),
2. {install_dir}/venv (production default; what enroll_bootstrap
creates),
3. {install_dir}/{.venv,.311,.312,.313} (common dev conventions).
Init aborts before any file writes if nothing resolves — an
operator-friendly error beats journalctl spam on every unit restart.
python3-venv doesn't set a persistent system variable — $VIRTUAL_ENV
lives in the activated shell only — so this has to be decided + baked
in at init time; there's no way for systemd to "inherit the current
venv" at unit start.
Test mode (--prefix) skips venv validation so the existing test suite
doesn't need to stub up a venv tree per case.
'decnet status' used to psutil-scan for cmdlines matching hand-coded
service launch args. That worked on dev boxes running workers via
'python -m decnet.cli ...' but missed the systemd reality on real
hosts: units may be installed but not started, failed, or in
auto-restart — all invisible to a cmdline grep.
New behaviour: status calls `systemctl list-units --type=service --all
--output=json 'decnet-*.service'` and renders the unit/load/active/
sub/description matrix. One view works for masters, agents, and
mixed hosts — iterates over whatever 'decnet-*' units were installed
by 'decnet init' / the enroll-bundle. Agent/master mode filtering is
no longer needed in the CLI; the host literally does not have
master-only units installed if it enrolled as an agent.
The psutil path survives as a fallback for boxes without systemd
(dev laptops, CI containers, minimal init systems) so the command
stays useful there. Clearly labelled 'psutil fallback' in the table
title so operators know which view they're looking at.
Locust spawns N virtual users (default 1000), all from 127.0.0.1 as
admin. /auth/login is rate-limited 10/5min per-IP AND per-username, so
the 11th on_start() got 429 and a RuntimeError. A @task(2) login in
the task weights turned the whole run into a 429 factory even after
ramp-up. And _login_with_retry treated 429 as non-retryable, so there
was no graceful degradation path.
Three changes, one root cause:
- decnet/web/limiter.py: read DECNET_LIMITER_ENABLED (default true).
When false, slowapi's Limiter(enabled=False) makes @limiter.limit a
no-op. Default ships unchanged; nobody should ever release with this
off.
- tests/stress/conftest.py: set DECNET_LIMITER_ENABLED=false in the
uvicorn subprocess env. Stress tests measure throughput, not rate
limiting.
- tests/stress/locustfile.py: drop the @task(2) login — it added zero
coverage (every user already logs in at on_start) and only generated
contention. Teach _login_with_retry to honour 429 + Retry-After so a
Locust pointed at a limiter-enabled server degrades gracefully
instead of crashing on_start.
MySQL can't index a BLOB/TEXT column without a prefix length, so
create_all() on a fresh MySQL schema blew up with "BLOB/TEXT column
'kd_digraph_simhash' used in key specification without a key length".
SimHashes are a fixed 8 bytes — the variable-length type was a
SQLAlchemy-side auto-mapping from 'Optional[bytes]', not an actual
schema requirement. Switch to BINARY(8), which is portable: MySQL gets
a fixed-width indexable BINARY, SQLite treats it as BLOB and doesn't
care about key length.
Adds instance_seed.py to every service template (conpot, docker_api,
imap, k8s, llmnr, pop3, rdp, sip, smb, snmp, ssh, telnet, tftp, vnc).
Derives a stable per-instance seed from NODE_NAME (+ optional
INSTANCE_ID) and exposes deterministic helpers for the boring details
scanners would otherwise use to fingerprint the whole fleet as one
machine: cluster UUIDs, auth salts, uptime fixtures, minor version
strings. Connection-time jitter is intentionally NOT seeded — two hits
to the same decky must not replay the same latency curve.
Identical source across every template; lives next to each service so
the Docker build context picks it up without a shared package-data hop.
Exclude lists fail open — anything new at the master's repo root (venvs,
logs, dev notes, .env.local, local DB dumps) silently leaks into every
agent bundle. On this box a stray .311 venv (335 MB) + logs/ (220 MB)
bloated the tarball to ~150 MB and blew test_enroll_bundle timeouts.
Replace _EXCLUDES + _is_excluded with _INCLUDED_ROOT_FILES +
_INCLUDED_DIRS + _EXCLUDED_DECNET_SUBTREES and iterate via os.walk with
in-place dirnames[:] pruning so master-only subtrees (decnet/web,
decnet/mutator, decnet/profiler) and __pycache__ aren't descended into
at all.
Bundle contents are now strictly: pyproject.toml + the decnet/ package
minus the three master-only subtrees. Synthetic entries (INI, certs,
systemd units) unchanged — they were always added inline, not from the
tree walk.
test_enroll_bundle.py: 20/20 pass in 24s (was timing out at 15s/test).
Populates Attacker.country_code + country_source (MVP) using the five
RIR delegated-stats files (ARIN/RIPE/APNIC/LACNIC/AFRINIC). Offline,
license-free, no outbound traffic that could burn honeypot stealth.
- decnet.geoip package with factory/base/lookup + rir/ subpackage
(fetch/parse/provider) mirroring the db + bus factory convention
- Profiler._build_record calls enrich_ip on every upsert
- Idempotent ALTER TABLE migrations for both SQLite and MySQL
- decnet geoip refresh/lookup CLI (master-only)
- /var/lib/decnet/geoip seeded by decnet init
- DECNET_GEOIP_ENABLED=false kill-switch; set in tests/conftest.py so
unit tests never trigger the first-access fetch
The config file `decnet init` dropped at /etc/decnet/config.ini was a
stub with a single [decnet] header saying 'reserved for future structured
settings.' Admins who wanted to tune DECNET_API_HOST, DECNET_DB_URL,
DECNET_BATCH_SIZE, etc. had to hunt env.py for the exact variable name
and drop it in .env.local.
Changes:
- decnet/config_ini.py — adds a _DOMAIN_MAP translation table covering
[api], [web], [database], [bus], [swarm], [logging], [ingester],
[tracing]. Loads regardless of mode; unknown keys inside a known
section log a WARNING (operator typos shouldn't be silent).
Explicit key map (not auto kebab-to-snake) so [web] admin-user lands
in DECNET_ADMIN_USER without silently renaming the env-var contract
consumers import from decnet.env.
- decnet/cli/init.py — renames the placeholder target config.ini →
decnet.ini (unifies with the name already used by load_ini_config and
the enroll bundle's _render_decnet_ini). Placeholder body now shows
every domain section as a commented example so admins learn the
shape by reading. Deinit removes both decnet.ini and the legacy
config.ini so upgrading hosts leave no orphan file.
Precedence is unchanged: real env > INI > built-in default in env.py.
os.environ.setdefault means systemd EnvironmentFile= and one-off
DECNET_FOO=bar decnet ... invocations always win.
Secrets explicitly NOT moved to the INI:
- DECNET_JWT_SECRET
- DECNET_ADMIN_PASSWORD
- DECNET_DB_PASSWORD
They stay in .env.local / EnvironmentFile= — never in a group-readable
INI, never in a diff, never on the dashboard.
Dev/profiling flags (DECNET_DEVELOPER, DECNET_EMBED_*, DECNET_PROFILE_*)
also stay env-only per maintainer direction — dev knobs shouldn't
be one 'I'll flip this for tonight' away.
Tests: +5 in test_config_ini.py (domain sections load regardless of mode,
env beats INI for domain keys, unknown key warns, absent section is
no-op, role section beats domain section via setdefault precedence). +1
in test_init.py (placeholder writes decnet.ini with every section
header present as commented guidance).
31 tests pass across the two files (was 26).
Distros reserve /opt for different things (some package managers own it
outright), and a DECNET install that wants to live at /srv/decnet or
/usr/local/decnet had to hand-edit 13 service files post-install.
Converts every deploy/decnet-*.service to a .j2 template keyed on
{{ install_dir }}, rendered by `decnet init` at install time. All other
paths (log_dir, state_dir, runtime_dir, user, group) stay standard —
only install_dir varies.
Changes:
- deploy/decnet-*.service → deploy/decnet-*.service.j2 (13 files).
- decnet init gains --install-dir (default /opt/decnet, preserves
existing behaviour byte-for-byte). Validates absolute-path at the
CLI boundary. Threads through useradd --home-dir and the dir-creation
list so the filesystem layout matches the rendered templates.
- _install_units renders via Jinja2 with StrictUndefined (typo → loud
error, not a silent broken unit). SHA over rendered output so
operators with a custom install_dir get idempotent re-runs.
- decnet.target, tmpfiles.d, polkit rule stay static — they don't
reference install paths.
- 4 new tests: custom install_dir renders into units, default remains
/opt/decnet, relative paths rejected, second run with same custom
dir is idempotent.
Worker bus instances (collector, ingester) close their private buses
in finally blocks on shutdown, but stream threads holding closure
references kept calling publish after close — one `RuntimeError:
publish on closed bus` per stream line, caught by publish_safely
and logged per call, flooding server logs.
Changes:
- `UnixSocketBus.publish()` now drops post-close calls. First drop
WARNs loudly (bus is critical infra — silent drops would hide real
problems); subsequent drops on the same instance log at DEBUG to
prevent the flood. Sticky `_closed_publish_warned` flag, reset
naturally per new bus instance.
- `make_thread_safe_publisher` short-circuits on a closed bus before
marshalling a coroutine onto the loop. Avoids the wasted scheduling
work in the hot shutdown path.
Degradation is safe: callers go through `publish_safely`, which
already treats exceptions as 'dropped notification, DB is source of
truth.' We just stop manufacturing the exception in the first place
for a known-benign condition.
A startup race between `decnet bus` being ready and the API's lifespan
hitting `get_app_bus()` at api.py:135 would set `_tried = True`
permanently, poisoning the singleton for the rest of the process: the
dashboard shows BUS OFFLINE, topology SSE falls into the bus-is-None
snapshot-only branch, mutator publish calls no-op. Only an API
restart recovered.
Replaces the one-shot veto with a time-gated retry keyed on a
`_last_failure_ts` monotonic timestamp plus a 2 s backoff. Publishers
on the hot path still pay at most one connect attempt every 2 s when
the bus is down, but the singleton auto-recovers within 5 s (one
dashboard poll) once the bus comes up.
The asyncio lock still serialises concurrent callers so the bus server
doesn't get stampeded with parallel connect attempts on startup.
Registers a generic @app.exception_handler(Exception) that catches anything
uncaught in route handlers / dependencies. Prod response is opaque:
{detail: 'Internal Server Error', error_id: <uuid4 hex>}. Dev mode
(DECNET_DEVELOPER=True) adds exception_type and traceback fields so
failures are debuggable without tailing server logs.
The error_id is logged alongside the full traceback server-side, letting
operators correlate a user's 500 report with the exact exception via
`grep <error_id> /var/log/decnet.log`.
FastAPI's own HTTPException routing and the existing
RequestValidationError / ValidationError / RateLimitExceeded handlers
still take precedence — this handler only fires on genuinely-uncaught
exceptions.
Flips threat model F1/I 'traceback / stack trace leakage' from ? to M
and logs a follow-up checklist entry for 4 detail=str(e) sites in the
fleet deploy router (admin-gated, different threat class, separate
audit).
Adds slowapi two-bucket rate limit on /auth/login — 10 attempts per
5 minutes per-IP AND per-username, tripping either → 429. Per-IP
catches botnets hitting one account; per-username catches distributed
credential stuffing against one account. In-memory storage: dashboard
API is single-process, Redis is disproportionate for v1.
X-Forwarded-For is deliberately NOT trusted (spoofable); reverse-proxy
deployments get one shared bucket per proxy IP. Logged in the threat
model as accepted risk DA-08, to be revisited when a verified-proxy
config lands.
Also scaffolds development/THREAT_MODEL.md with STRIDE-per-element
methodology, system-context DFD, and Dashboard↔API as the first fully
worked component (7 sub-flows, ~50 threat entries). F1 Authn ships
with 3 threats mitigated: rate limit (new), uniform 401 (verified
already in place), bcrypt length clamp (verified already in place via
Pydantic max_length=72).
Adds GET /attackers/{uuid}/smtp-targets (viewer) and GET /attackers/{uuid}/mail
(admin) endpoints, plus two new sections on the attacker detail page:
VICTIM DOMAINS rollup (aggregate-only, federation-gossip-safe) and STORED MAIL
with a drawer that decodes headers, lists attachments, and downloads the raw
.eml via the existing artifact endpoint (?service=smtp).
New SmtpTarget table records each (attacker, domain) pair observed via
the SMTP honeypots. Only the domain is stored — local-parts are dropped
at ingestion, so this table holds no user-identifying data beyond the
target organisation's identity.
The profiler worker extracts domains from rcpt_to / rcpt_denied /
message_accepted events, normalizes them (lowercase, strip local-part,
drop blocked TLDs), and upserts one row per pair with a running count +
first_seen / last_seen.
Three repo methods shipped:
* increment_smtp_target(attacker, domain) — upsert + bump
* list_smtp_targets(attacker) — per-attacker view
* smtp_target_seen(domain) — cross-attacker aggregate, shaped as the
federation-gossip RPC that V2 will expose.
The gossip-query shape is load-bearing: each operator can answer
"have any of your attackers targeted corp1.com?" without leaking
which attackers or when — the aggregate returns a bool + total count
+ first/last seen, nothing else.
SMTP template now writes each accepted DATA body as a .eml file into a
bind-mounted per-decky quarantine dir and emits a `message_stored` log
with sha256, size, decoded headers, and an attachment manifest
(filename + sha256 + size + content-type). Attachment hashing uses the
*decoded* payload so operators can match against VT / MalwareBazaar
directly. Body accumulator is capped at SMTP_MAX_BODY_BYTES (default
10 MB, matching the EHLO SIZE advert) so a streaming client can't OOM
the container.
The existing /api/v1/artifacts/{decky}/{stored_as} endpoint now takes
an optional ?service= query param (defaults to ssh for back-compat)
and can serve .eml files out of the smtp subdir. Forensic metadata
rides the normal log pipeline, same as SSH file_captured.
decnet/web/db/models.py was approaching 1000 lines across User/Log/
Attacker/Swarm/Topology/Workers/Updater/Health domains. Split into a
package with one module per domain; __init__.py re-exports every symbol
so all 52 call sites keep importing from decnet.web.db.models
unchanged.
New purpose-built table with schema_version column committed from day one
so V2 federation gossip can cluster sessions across operators without
retrofitting. Ships with the empty write path (upsert_session_profile);
ingestion of keystroke features (IKI moments, control-char rates, digraph
SimHash) is tracked as V2 work.
Closes gap #2 from SIGNAL_CAPTURE_AUDIT.md.