Commit Graph

670 Commits

Author SHA1 Message Date
efdaa87ee2 feat(web/mazenet): amber-tint pending LAN placeholders
Pre: optimistic placeholders for enqueued LAN-add mutations were
indistinguishable from regular not-yet-deployed nets — same dim
mono chrome, same dotted border. User couldn't tell whether a drop
had been queued or had silently failed and re-stacked over an
existing LAN.

Tag the placeholder with `pending: true`, render it in the same
amber the REAP button uses (var(--warn, #e0a040)) with a 'PENDING'
chip-mini in the head. Visual is loud enough that there is no
chance of confusion with INACTIVE (dimmed) or regular pending-state
LANs (mono).

Reconciliation is the existing refetch pumping setNets(h.nets) on
SSE — no extra plumbing needed; placeholders disappear naturally
when the mutator's applied event lands and the canvas re-hydrates
from the server.
2026-04-24 22:27:40 -04:00
bfb5d8c33c fix(web/mazenet): split Net.name (canonical) from Net.label (display)
Two bugs sharing the same root cause: Net only carried a label
string, set to lan.name.toUpperCase() everywhere. Backend mutator
ops look up LANs by canonical lowercase name, so passing the
uppercase label through attachEdge / detachEdge / addDeckyToLan /
deleteLan failed with 'LAN \\'SUBNET-XXXX\\' not found'.

Add Net.name (canonical, lowercase) alongside Net.label (display).
Every backend call site now passes name; toasts and drag ghosts
keep label.

Second bug — new LANs stacking on top of each other on live
topologies — fell out of the same UX path: createLan returns
'enqueued' when the topology is active/degraded, the existing
early-return skipped local-state insertion, so the next drop
recomputed the same grid index. Now we drop a placeholder Net
with id 'pending-lan-<name>' immediately on enqueue. Grid index
advances and the user gets a visual ack right away; SSE replaces
the placeholder by canonical id when the mutator applies it.
2026-04-24 22:21:55 -04:00
37050a4bcd fix(db): claim_next_mutation works on MySQL — derived-table workaround
MySQL ERROR 1093 forbids referencing the UPDATE target inside a
subquery; the existing UPDATE ... WHERE id = (SELECT id FROM
topology_mutations ...) form blew up on every mutation claim under
the MySQL backend, so no mutation ever progressed past pending.

Wrap the inner SELECT in a derived table (SELECT id FROM (...) AS
_next). MySQL materialises the derived rowset before applying the
UPDATE, sidestepping 1093. SQLite accepts both forms, so the
single-statement atomic claim semantics are preserved on both
backends — racing watchers still serialise correctly.
2026-04-24 22:15:23 -04:00
99bc9a8b6d fix(engine): offload blocking compose to a worker thread
deploy_topology and teardown_topology are async, but every
_compose_with_retry / _compose call inside them was running in the
main event loop via subprocess.run — which means a multi-minute
docker compose --build froze the entire API: other endpoints,
mutator events, SSE streams, status polls. The user noticed when a
2-decky deploy blocked everything else for the duration of the build.

Wrap both calls in anyio.to_thread.run_sync. Same pattern the
mutator engine has been using at engine.py:104 since forever.

Per-LAN bridge create/remove docker SDK calls are still synchronous
in the loop — they're individually fast (~50-200ms per LAN) and
the loops are bounded by topology size, so they don't dominate.
Worth revisiting if a 200-LAN deploy turns out to stall noticeably.
2026-04-24 22:14:08 -04:00
f8ef0a5cf1 fix(deploy): redirect DOCKER_CONFIG out of $HOME so ProtectHome doesn't kill builds
The api unit's ProtectHome=read-only made the user's HOME read-only
inside the unit's namespace. docker compose --build then tried to
write ~/.docker/buildx/activity/* and got EROFS — which we'd been
misdiagnosing as a buildx wedge for the last few iterations.

Real fix: set DOCKER_CONFIG and BUILDX_CONFIG in the unit's
Environment= to a path inside ReadWritePaths. Hardening stays on,
docker CLI writes to install_dir/.docker instead of /home/<user>/.docker.

The wedge classifier now detects this case (count==0 + /home/ in
the stderr path) and emits a recipe pointing at the env-var fix
instead of the driver-rebuild path. Test added.

Wiki gets the new branch first since it's the most common cause
on systemd-managed installs.
2026-04-24 22:07:13 -04:00
257624e6a7 fix(engine/buildx): recipe used reserved 'default' builder name
'docker buildx create --name default' errors with 'default is a
reserved name and cannot be used to identify builder instance'.
The bundled builder always exists under that name; the recipe
should switch to it (buildx use default), not try to recreate it.

For the count==0 driver-rebuild branch, the new builder needs a
non-reserved name — using 'decnet-builder' as the example.
2026-04-24 22:02:20 -04:00
40a31d8bc7 fix(engine/buildx): branch recovery recipe on leaked-mount count
The hint was one-size-fits-all and pointed at prune+restart even
when zero mounts were leaked — a false positive caused by matching
any stderr containing the activity-dir path.

Two changes:

1. Tighten the wedge classifier. Both the buildx-specific phrase
   ('failed to update builder last activity time') AND the EROFS
   marker ('read-only file system') must appear in stderr. Either
   alone is now treated as a normal transient error and retried.

2. Branch the recipe on _count_leaked_buildkit_mounts():
   * count > 0 → unmount loop + daemon stop + umount -l
     (prune+restart alone doesn't evict held mounts)
   * count == 0 → rebuild the buildx driver (rm builder state,
     buildx create --use, inspect --bootstrap)

Original compose stderr is now preserved in the hint as
'Original error: ...' so the user sees both the recipe and what
compose actually said.

Tests cover both branches plus a negative case (unrelated EROFS).
2026-04-24 21:58:09 -04:00
05d225ae38 fix(engine): surface CalledProcessError.stderr in deploy-failure log + status reason
str(CalledProcessError) is just 'Command ... returned non-zero exit
status N' — the stderr (where the buildx recovery hint lives) was
being silently dropped from both the deploy log line and the
persisted 'failed' status reason.

New _format_subprocess_error helper appends .stderr when the
exception is a CalledProcessError. Applied to transition_status
reason and the background-deploy log message so operators and the
UI see the real failure, not just the exit code.

This is what makes the buildx preflight hint from 86b9dec actually
reach the user.
2026-04-24 19:31:37 -04:00
86b9decf80 fix(engine): detect wedged buildx + surface recovery hint on deploy
When Docker's buildx leaks bind-mounts from a failed build it starts
reporting 'read-only file system' on its own activity file, even
though nothing is actually read-only. The user's host had 20+
leaked mounts before we noticed — each retry compounds the leak.

_compose_with_retry now:
 * Pre-flight counts /var/lib/docker/tmp/buildkit-mount* entries in
   /proc/self/mounts; if >= 10 and the command is a build, refuses
   to start and returns a clean recovery recipe instead of retrying.
 * On mid-build failures that match the wedge signature
   ('failed to update builder last activity time' or the activity-dir
   path in stderr), short-circuits the retry loop with the same
   recipe. The first occurrence no longer needs a pre-flight; the
   pre-flight catches repeat attempts.

Recipe points at 'docker buildx prune -af && sudo systemctl restart
docker', which is what actually clears the leaked mounts.

Tests cover all three paths: wedge preflight blocks builds, non-build
commands (down/stop) ignore the preflight, mid-build signature
detection kills the retry loop. A new autouse fixture stubs the
wedge-detector to 0 so dev-host state doesn't poison the mocked
subprocess tests.

Wiki companion commit adds Troubleshooting → 'Buildx leaked mounts'.
2026-04-24 19:25:45 -04:00
a8356407c5 feat(web/mazenet): cross-LAN port drag now creates a real bridge
Port-to-port edges previously lived only in the editor's local state
— the backend's edge model is decky<->LAN membership, so the deploy
validator still saw cross-LAN pairs as orphans. Drawing a line from
dmz-gateway to a decky in subnet-d6b2 did nothing that a later
DMZ_ORPHAN check could see.

Now onAddEdge inspects endpoints: same-LAN stays visual (no bridge
to create), cross-LAN calls attachEdge with the source decky and
the target LAN, multi-homing the decky so the validator's LAN
adjacency scan threads through it. The viz edge stores the returned
backendEdgeId; removeEdge detaches that membership before dropping
the local edge. Observed entities (attacker-pool) are read-only and
never bridge.

A toast ("BRIDGED <decky> -> <lan>") surfaces the backend-persistent
side of the gesture so the user knows it's not just a cosmetic line.
2026-04-24 19:18:02 -04:00
c214cdd7bb fix(api/topology): map duplicate-name IntegrityError to 409
POST /topologies raised a 500 with a raw SQLAlchemy IntegrityError
traceback when the name collided with an existing topology. Catch
the error at the router, verify it's the ix_topologies_name
constraint (so unrelated integrity failures still surface as 500s
with their real traceback), and return 409 with a helpful detail.

Test covers the create-then-duplicate-create flow.
2026-04-24 19:06:37 -04:00
9bed930497 perf(web/mazenet): auto-disable edge flow animation above 60 edges
The .maze-edge-dash CSS animation invalidates each path's bounding
box every frame. Inter-LAN paths span the viewport so invalidations
overlap, and past ~60 edges the compositor spends every frame
repainting — the dominant cost on the 12+ LAN screenshot, even
dwarfing pan-drag overhead.

Drop the animation class when edges.length > 60. Edges stay fully
visible and traffic-tinted, just static. A MOTION: OFF segment in
the status bar surfaces the auto-disable so it doesn't look like a
broken animation.

Threshold is a constant in Canvas.tsx; if it needs to become a
user toggle later, lift it to state + localStorage in one place.
2026-04-24 19:01:25 -04:00
f3408d5e62 fix(topology/allocator): widen default subnet base to /12 for mass-scale
A 30-LAN generate request already fits in 172.20.0.0/16, but trees
with depth/branching that multiply past 256 (e.g. depth=6,
branching=4 ≈ 5k LANs) hit AllocatorExhausted before the first
write.

SubnetAllocator now accepts a full CIDR base ("172.16.0.0/12" →
4096 /24s) in addition to the legacy two-octet shorthand ("172.20",
auto-lifted to /16). The parent must be ≤/24; a /24 base yields
exactly one slot. Iteration order is preserved for /16 bases so
existing topologies keep their third-octet sweep; /12 adds a
second-octet dimension underneath.

Defaults bumped to 172.16.0.0/12: TopologyConfig.subnet_base_prefix,
/next-subnet query param, and the mutator's add-LAN fallback. The
field pattern widens to accept CIDR. create-blank and manual LAN
CRUD still use "10.0" (lifts to /16) — one DMZ LAN per topology,
256 is plenty.
2026-04-24 18:57:55 -04:00
207f791684 perf(web/mazenet): ref-driven pan, memoized children, indexed edge lookup
Pan/zoom previously drove a full Canvas re-render on every mousemove
via setPan() — at 30 LANs that's ~1000 SVG paths and div cards
re-evaluating 60 times a second while you drag. The browser screamed.

Three fixes, one surgical pass:

1. Pan drag writes the translate/scale transform directly to the
   pan-layer DOM ref inside requestAnimationFrame; setPan is deferred
   to mouseup. Grid pattern attributes (x/y/width/height) get the
   same treatment so the backdrop stays glued to the canvas content.
   Wheel zoom, resetPan, and zoomBy also sync refs + fire a write so
   React-driven changes land in one frame.

2. Edge rendering swaps the nodes.find() inside .map() for a
   Map<id, node> built once per render — O(E) instead of O(E·N).
   NetBox + NodeCard are now wrapped in React.memo; Canvas hoists
   the setSelection closures into useCallback so memo can actually
   short-circuit instead of seeing a fresh prop every render.

3. Drag-a-single-node still mutates state and re-renders, but now
   only the moved node rerenders — the other 89 skip via memo.

Everything that reads panRef.current (toWorld, context menu, drop
targeting) still sees the live value during drag because we mutate
the ref synchronously on each mousemove; only React state is lazy.
2026-04-24 18:48:05 -04:00
c973ded2fc perf(web/icons): per-icon lucide imports via centralised alias
Route all lucide-react icon usage through a single src/icons.ts
re-export that imports each icon from its own per-icon module
(lucide-react/dist/esm/icons/<name>) instead of the barrel.

Bundle-size impact: none (29kB icons chunk unchanged — tree-shaking
was already effective with sideEffects:false). Dev-experience win:
Vite transforms 247 modules instead of 1848 because the dep
optimiser no longer pre-bundles the full lucide barrel — faster
cold start and HMR.

Ambient d.ts declares the wildcard module so TS accepts per-icon
imports; lucide ships .d.ts only for the barrel.

Seven icons were renamed upstream and still work through the barrel
via aliases (AlertTriangle -> triangle-alert, BarChart3 -> chart-column,
CheckCircle -> circle-check-big, Filter -> funnel, PlusCircle ->
circle-plus, Sliders -> sliders-vertical, UploadCloud -> cloud-upload,
Fingerprint -> fingerprint-pattern). Component call sites stay on
the legacy names; the renames live only in icons.ts.
2026-04-24 18:41:33 -04:00
52cbb01555 perf(web): lazy-load page routes + prefetch-on-hover
Switch all navigable route components to React.lazy() and wrap
<Routes> in <Suspense>. Dashboard/Login/Layout stay eager since
they're the shell.

Initial index bundle drops 246kB -> 34.67kB (gzip 10.5kB). Each
route becomes its own 8-51kB chunk, loaded on demand.

Nav hover/focus triggers prefetchRoute(path) which fires the same
dynamic import() specifier the bundler dedups against React.lazy,
so the chunk is warm by the time the user clicks. Avoids the
Suspense flicker that would otherwise show on every first nav.
2026-04-24 18:38:26 -04:00
7389ddb62c chore(web/build): split vendor chunks — 705 kB main bundle → 246 kB
Single-bundle build was tripping vite's 500 kB warning per chunk and
forcing every user to re-download the entire app on every deploy.
Manual chunks split the bundle along natural library boundaries so:

- Rarely-changing vendor libs (react-dom, react-router, lucide-react,
  asciinema-player) cache across deploys.
- App code lives in its own `index-*.js` that's the only chunk that
  changes when we ship feature work.

Split shape (manualChunks fn in vite.config.ts):
- charts   — recharts + d3-*
- player   — asciinema-player
- icons    — lucide-react
- router   — react-router / react-router-dom
- react-dom, react
- vendor   — everything else in node_modules

Resulting bundle sizes (gzip):
  index (app):       246 kB  (gz 63)
  react-dom:         182 kB  (gz 57)
  player:            176 kB  (gz 65)
  router:             42 kB  (gz 15)
  vendor:             36 kB  (gz 14)
  icons:              29 kB  (gz 10)

Every chunk under the 600 kB ceiling we now set explicitly. The old
~705 kB single-chunk deploy is gone. No code changes — config only.
2026-04-24 18:29:49 -04:00
aaac300cc4 tweak(web/ip-leaks): show only 1 IP inline, rest via + N more
Five was still too loud on AttackerDetail when rotation is in play.
One inline is enough to read at a glance; everything else goes
behind the expand button. Rotation tag keeps carrying the count so
no signal is lost.
2026-04-24 18:26:35 -04:00
c78ab032bd fix(xff): truncate LEAKED IPs + ROTATION badge for rotation attacks
`for i in $(seq 1 100); do curl -H "X-Forwarded-For: 191.100.20.$i" ...`
was dumping 100 distinct IPs into AttackerDetail's LEAKED IPs row,
drowning the rest of the ORIGIN section. The 100-IP wall is itself a
signal (WAF-bypass-list probing) that deserves a short badge, not a
flood.

Backend:
- get_attacker_ip_leaks gains `limit: int = 10` parameter — caller
  only ever needs a sample, not the full set.
- New count_attacker_ip_leaks() returns the unbounded COUNT(*) via
  one cheap SQL aggregate.
- Detail endpoint returns {ip_leaks: [first 10], ip_leaks_total: N}
  so the UI can render a rotation badge independent of list length.

UI:
- New LeakedIPsRow component. First 5 distinct IPs rendered inline
  with hover tooltips (unchanged). When > 5, a `+ N more` expand
  button reveals the rest of the sample; when total exceeds the
  10-row cap, a subtle `(+M beyond sample)` note appears.
- When total ≥ 20, a red `ROTATION · N` tag renders leading the
  row with a tooltip explaining the semantic: "almost certainly
  XFF-rotation / WAF-bypass probing, not a real attribution leak."

DB churn is deliberately not capped — 100k rows × ~500 B is tolerable.
If it becomes a problem we can add an ingester-side count-and-skip;
for now the UX fix is the whole story.

Added test_ip_leaks_total_reported_separately_from_list asserting
the endpoint shape matches what the UI consumes.
2026-04-24 18:25:46 -04:00
ca39552692 feat(ua): classify User-Agent into scanner/cli/library/bot/nonstandard
Every http_useragent bounty now carries a `category` label plus an
optional tool name and a signals list. The main analytic win is the
`nonstandard` bucket — UAs like "FUCKYOU/1.0" or custom one-off
scanner labels that don't match any known pattern, which today
silently blend into the generic fingerprint list.

Buckets (priority order):

- scanner: nmap, nuclei, sqlmap, gobuster, nikto, masscan, zgrab,
  ffuf, wpscan, katana, burp, acunetix, nessus, openvas, arachni,
  whatweb, wappalyzer, etc.
- cli:  curl, wget, httpie, xh, fetch.
- library: python-requests, aiohttp, httpx, urllib, Go stdlib, Java,
  okhttp, Apache HttpClient, axios, node-fetch, got, undici, PHP,
  Guzzle, Ruby stdlib, Faraday, .NET, PostmanRuntime, Insomnia, etc.
- bot:  anything containing bot / crawler / spider / slurp / monitor
  (catches Googlebot, bingbot, Baiduspider — many of which ship a
  Mozilla/5.0 prefix, so the bot check runs BEFORE the browser
  regex).
- browser: Mozilla/5.0-prefixed UAs that aren't bots.
- nonstandard: anything else. The interesting bucket.
- empty: literal empty User-Agent header.

Side signals computed regardless of category: suspicious_short (<8
chars), suspicious_long (>512 chars), nonprintable (control chars),
injection_like (SQLi / XSS / path-traversal / Log4Shell markers).
A sqlmap UA with a literal SQL-injection payload embedded fires
category=scanner + injection_like — the combination tells the
analyst the tool is being operated manually vs. on default config.

Classification is deterministic (same UA string → same tuple) so
add_bounty's payload-hash dedup continues to collapse repeat rows.

UI renderer upgraded from FpGeneric to a dedicated FpUserAgent that
colours the category tag by risk (scanner=alert-red,
nonstandard=warn-yellow, browser=accent-green, etc.) and renders
each signal as its own chip. Makes the interesting rows pop in the
fingerprints panel.

Also fixed: the ingester was using `_headers.get("User-Agent") or
_headers.get("user-agent")`, which short-circuits away empty-string
UAs. An explicit empty UA is itself a signal (real clients always
send something) — now captured.
2026-04-24 18:17:18 -04:00
6d1d69443a fix(xff): split leak from spoof — loopback/private claims aren't leaks
An attacker hitting /admin with `X-Forwarded-For: 127.0.0.1` was
previously flagged as an IP leak. It isn't — that's the classic
IP-allowlist / WAF-bypass payload ("treat me as localhost and skip
your auth checks"). Misclassifying it as "LEAKED IPs" in the UI
confuses analysts and burns trust in the signal.

Split by claim category. After pulling the left-most claimed IP
from the proxy header, classify:

- public (routable) → bounty_type=ip_leak (real attribution leak;
  the attacker's upstream proxy forwarded their real IP).
- loopback / private / link-local / multicast / reserved /
  unspecified → bounty_type=fingerprint, fingerprint_type=
  spoofed_source (WAF-bypass / allowlist-probing attempt; the
  attacker is telling us they know what XFF does).
- unparseable → dropped.

Same extraction pipeline; diverges only at the last step. A new
shared _classify_proxy_header_claim returns (kind, payload);
_detect_ip_leak keeps its public-only contract for backward-
compat; _detect_spoofed_source is the new sibling.

UI renderer FpSpoofedSource shows the claimed IP in warn color with
the claim_category tag (LOOPBACK / PRIVATE / ...) and a WAF-BYPASS
ATTEMPT badge — distinct visual from the "LEAKED IPs" row which
stays reserved for genuine public-IP leaks.

Test addresses updated: RFC 5737 doc ranges (198.51.100.0/24,
203.0.113.0/24) are flagged `is_reserved` in Python's ipaddress
module, so they now correctly belong to the spoof bucket — tests
that meant to exercise real public IPs now use 8.8.8.8 / 1.1.1.1 /
Cloudflare DNS. Added eleven new tests locking the classifier +
the two detectors' mutual exclusion.
2026-04-24 18:06:29 -04:00
2c876b4d86 fix(bounties): strip per-request fields from fingerprint payloads
add_bounty dedups on (attacker_ip, bounty_type, full payload JSON).
Three fingerprint-family bounties (http_useragent, ip_leak,
http_quirks) were including method/path / header_count in their
payloads — fields that vary per request — so a scanner hitting 100
paths produced 100 rows instead of 1, which is what was swelling
AttackerDetail.

Payloads now carry identity-only fields:

- http_useragent: {fingerprint_type, value}. UA + path combinations
  no longer collide; one row per distinct User-Agent string.
- ip_leak: {source_ip, real_ip_claim, source_header, headers_seen}.
  One row per distinct (proxy source, leaked IP, leaking header)
  triple; repeat hits with the same header on different paths dedup.
- http_quirks: {fingerprint_type, order_hash, order, casing_hash,
  casing_category, stable_count, tool_guess}. No more header_count
  (included volatile headers; Cookie-presence variance broke dedup).

Per-request context (path, method, etc.) was never load-bearing for
analysts — the logs table already answers "when + where" at
per-event resolution. The bounty table is for stable identity.

UI:
- FpHttpQuirks renderer drops the method/path footer line and the
  header_count/duplicates tags; shows stable_count instead.
- LEAKED-IPs tooltip on AttackerDetail swaps "X on GET /path" for
  "Leaked via X; source 203.0.113.42" — same information, stable.

Tests add a "payload stable across paths and methods" assertion on
http_quirks — locks the contract so a future regression that sneaks
a per-request field back in fails loudly.

Existing duplicate bounty rows don't retroactively collapse.
Dev: `decnet db-reset --i-know-what-im-doing drop-tables` and
restart. Prod: one SQL pass to dedup by (attacker_ip, bounty_type,
payload) — trivial but not automated.
2026-04-24 17:58:54 -04:00
dccb410bb3 feat(http): header-quirks fingerprint — order + casing + tool guess
Per-request HTTP fingerprint derived from the header dict we already
log. Captures:

- order_hash: SHA-256 prefix (16 hex) over the lowercased header-name
  sequence, minus volatile/per-request headers (Content-Length,
  Cookie, Authorization, XFF family, trace IDs). Stable identity for
  a given client stack regardless of which target / path is hit.
- casing_hash: same shape but over the per-header casing category
  (Title-Case / lower / UPPER / mixed). Attackers frequently spoof
  User-Agent but forget their stack sends `user-agent` while browsers
  send `User-Agent`.
- tool_guess: prefix match against curl / python-requests /
  Go-http-client / nmap-nse signatures. Cheap, best-effort — the
  hash is the hard signal.
- duplicates: reserved for when the HTTP template switches from
  dict(request.headers) to a list form; today it always fires empty
  because dict() collapses duplicates.

Payload is a fingerprint bounty (bounty_type="fingerprint",
fingerprint_type="http_quirks"). Bounty dedup collapses identical
hashes per attacker — one row per distinct fingerprint — so a chatty
scanner doesn't spam the vault, but a tool-chain change from the
same IP surfaces as a new row.

UI renderer (FpHttpQuirks) shows the two hashes, tool guess badge in
violet, casing/count tags, and a collapsible header-order list.
Added to the passiveTypes group so it nests with JA3/JA4L/etc. in
the AttackerDetail fingerprints panel.

One library note: the naive "title-case" classifier failed on tokens
like `X-Forwarded-For` because Python's "".islower() returns False
so `p[1:].islower()` rejects single-letter tokens like the `X`.
Fix: explicitly accept single-char tokens when uppercase.
2026-04-24 17:51:40 -04:00
2a0c5ca410 feat(attackers): XFF mismatch detection — attacker IP leak bounties
Attackers routinely front their scanners with VPNs/proxies, so the
TCP source we log is the proxy egress, not the real host. But a
surprising number of attacker setups are misconfigured: the proxy
forwards the real IP in an X-Forwarded-For (or Forwarded / X-Real-IP
/ CDN-variant) header. From our side that's a free attribution leak.

New _detect_ip_leak extractor in decnet/web/ingester.py fires at
ingest time per HTTP request. Logic:

1. Require service=http, source_ip present, headers present.
2. If source_ip ∈ DECNET_TRUSTED_PROXIES (comma-separated IPs or
   CIDRs) → legitimate reverse-proxy forwarding, skip.
3. Walk proxy-family headers in priority order: Forwarded (RFC 7239)
   → X-Forwarded-For → X-Real-IP → True-Client-IP → CF-Connecting-IP.
4. Extract the left-most parseable IP from the winning header.
5. If that IP differs from the TCP source → emit a bounty with
   bounty_type="ip_leak" carrying {source_ip, real_ip_claim,
   source_header, headers_seen, path, method}.

Storage is the existing Bounty table — no schema change; de-dup is
handled by Bounty's (attacker_ip, bounty_type, payload_hash) key, so
repeat requests with the same leaked IP don't spam.

AttackerDetail renders a warn-accent "LEAKED IPs:" row under ORIGIN
listing distinct real_ip_claim values; hover tooltip shows the source
header + path of the most recent leak. Only shown when at least one
ip_leak bounty exists.

RFC 7239 Forwarded parser handles the full vocabulary — bare IPv4,
IPv4:port, quoted, IPv6 in brackets, IPv6 with port — returning only
IPs that actually parse.

Closes DEVELOPMENT.md "Network Topology Leakage → X-Forwarded-For
mismatches". Phase 3 of the three-phase Attacker Intelligence series
(phases 1: scanned-vs-interacted, 2: PTR records already shipped).

DECNET_TRUSTED_PROXIES env shape matches THREAT_MODEL DA-08's
"revisit when verified-proxy config lands" note — same token set
future rate-limit work will consume.
2026-04-24 17:39:03 -04:00
5a34371009 feat(attackers): PTR record (reverse DNS) enrichment
Resolve each attacker IP's rDNS name once at first sighting, store on
Attacker.ptr_record, render on AttackerDetail under ORIGIN. Many
attackers run infrastructure with forgotten rDNS that instantly
identifies them once surfaced: scan-node-42.shodan.io,
shady-vps.leasecloud.net, etc.

Resolver lives in decnet/geoip/ptr.py — colocated with enrich_ip
because the shape matches (take an IP, return supplementary
metadata, never raise). Uses the OS resolver via socket.gethostbyaddr
offloaded to the default executor, wrapped with asyncio.wait_for
timeout=2s so a slow authoritative NS can't stall the profiler tick.

Profiler side: _WorkerState grows a ptr_attempted: set[str] bounding
resolution to once per worker lifetime. Cold-start batches resolve
concurrently (Semaphore(_PTR_CONCURRENCY=10)) so a backlog doesn't
serialize 2s ceilings. _build_record gains a keyword-only ptr_record
parameter that, when _UNSET, omits the key from the record dict —
upsert_attacker's attribute-merge loop then preserves whatever's
stored on the row. Explicit None is a "fresh failed attempt" signal
and gets written through.

Env kill-switch DECNET_PTR_ENABLED=false for locked-down deploys
where egress DNS is forbidden. Private / loopback / link-local /
multicast / reserved addresses short-circuit before any DNS call.
IPv6 reverse DNS works transparently through the stdlib resolver.

Schema change — run once on upgrade:

  ALTER TABLE attackers
    ADD COLUMN ptr_record VARCHAR(256) NULL DEFAULT NULL;

Or drop-and-recreate on dev boxes (db-reset's SQLModel.metadata-driven
table discovery now picks it up automatically since ba155b7).

tests/conftest.py disables DECNET_PTR_ENABLED globally for the same
reason it disables DECNET_GEOIP_ENABLED — unit tests must never hit
the network. tests/geoip/test_ptr.py re-enables explicitly via an
autouse fixture.
2026-04-24 17:26:40 -04:00
351a8939c3 feat(attackers): scanned vs. interacted service bucketing on detail page
Adds a new card on AttackerDetail: SCANNED · N services | INTERACTED
WITH · M services. Distinguishes port-scanners (N high, M=0) from
actual engagement (M>0) at a glance — the analyst's first question
when triaging a new attacker row.

Classifier lives in decnet/correlation/event_kinds.py, a single
source of truth for the event-type vocabulary:

- INTERACTION_EVENT_TYPES — command-family (command/exec/query/...),
  SMTP engagement (mail_from/rcpt_to/message_accepted), file/payload
  activity (file_captured/upload/download_attempt/retr), pub/sub
  (publish/subscribe), recorded TTY sessions.
- NOISE_EVENT_TYPES — DECNET-internal (startup/shutdown/parse_error/
  unknown_*).
- Everything else defaults to scan. Conservative by design: new
  template verbs show up as "scanned" until explicitly promoted.

Bucket logic: a service is "interacted" if ≥1 of its events
classifies as interaction; otherwise "scanned" if ≥1 scan event;
noise-only services drop. Disjoint by construction.

Deliberate no-schema path: compute on-the-fly in the detail endpoint
via SELECT DISTINCT service, event_type FROM logs. Small result set
(tens of pairs per attacker), cost is trivial vs. the existing
behavior/commands queries. Trade-off: one more DB round-trip per
detail view in exchange for zero ALTER TABLE migration pain and
immediate classifier-change feedback loop.

Profiler's _COMMAND_EVENT_TYPES stays as-is (strict subset of
interactions that carry executable text), with a comment pointing at
the new canonical module.

Closes DEVELOPMENT.md "Attacker Intelligence §Service-Level Behavioral
Profiling — Services actively interacted with".
2026-04-24 17:12:20 -04:00
ce6b4a4174 fix(web/api): scope DB-retry sleep so tests don't starve background tasks
test_lifespan_db_retry patched decnet.web.api.asyncio.sleep to skip the
DB-retry backoff. Problem: asyncio is a shared module — the patch leaks
to every caller that looked up asyncio.sleep via `import asyncio`,
including run_health_heartbeat's own sleep loop. That heartbeat task
spawns inside the same lifespan; with its sleep mocked, the while-loop
spins tight, starves cancellation, and leaves an orphan task that
pytest-timeout eventually signals — surfacing as the 'Task exception
was never retrieved' warnings the user saw when running the suite.

Fix: give decnet.web.api a local binding `_retry_sleep = asyncio.sleep`
for the DB-retry wait, and have the test patch that instead. Narrowly
scoped, no impact on asyncio.sleep callers elsewhere.

Test timing before: 12s with --timeout=10 (interrupted by signal).
Test timing after: 0.58s. Full tests/web slice: 27s → 7.1s with the
spurious warnings gone.
2026-04-24 17:11:44 -04:00
efc98285aa fix(webhook/worker): self-heal when bus starts late or restarts
Before: if the bus was unreachable at worker start, we logged
"running in idle mode" once and parked on shutdown forever. systemd
doesn't guarantee bus is fully up before the webhook worker starts,
so a race on boot left the worker permanently dead until restart.

Now: wrap the whole bus-use in an outer reconnect loop.

  while not shutdown:
    try: connect()
    except: sleep(RECONNECT_SECS) ; continue
    try: run_with_bus(...)       # heartbeat + dispatch
    except: log+close ; reconnect on next iter

Clean consequence: if the bus dies mid-operation the dispatch loop's
subscriptions raise inside the consumer tasks, `_run_with_bus` exits,
the outer loop closes the stale connection and reconnects. No partial
state leaks across epochs — fresh bus, fresh subs, fresh heartbeat.

Interval is 60s by default, overridable via
DECNET_WEBHOOK_BUS_RECONNECT_SECS. Shutdown wakes the wait so
systemctl stop doesn't hang for a minute.

Test added: flaky get_bus that fails once, then returns a live
FakeBus — asserts retry + successful delivery.

get_app_bus() in decnet/bus/app.py already has a 2s backoff retry so
the FastAPI hot path self-heals; this commit brings the standalone
webhook worker in line with the same posture.
2026-04-24 16:39:38 -04:00
f0ee6ff97e feat(workers): enroll webhook worker in the Workers panel registry
Add "webhook" to KNOWN_WORKERS + the start-all preferred order so the
Config → Workers panel picks up the row automatically: heartbeat
subscription, start/stop controls via the existing systemd helper
(decnet-webhook.service.j2 already lands via decnet init's unit
glob), and the status-dot lifecycle all come for free.

Placed between mutator and the swarm-only agent/forwarder/updater
trio — matches the intended startup sequence (bus → api → data-plane
workers → egress → swarm management).

No frontend change needed; Config.tsx reads the worker list
dynamically from GET /api/v1/workers.
2026-04-24 16:34:14 -04:00
ba155b70e1 fix(cli/db-reset): drive table list from SQLModel.metadata, not hardcoded
The hardcoded _DB_RESET_TABLES tuple had drifted — session_profile,
smtp_targets, and webhook_subscriptions were all missing, so
`decnet db-reset --i-know-what-im-doing drop-tables` silently left
them behind. Running it on a post-webhook install then letting
SQLModel.metadata.create_all() re-create tables produced a partial
schema: old rows survived, new columns didn't land, and endpoints
500'd on the missing columns (e.g. auto_disabled_at after the
circuit breaker merge).

Replace the hardcoded list with `SQLModel.metadata.sorted_tables`,
reversed for DROP safety (children first). Any future model addition
is auto-enrolled — no manual step, no more drift.

No behavior change on reset semantics; the SET FOREIGN_KEY_CHECKS=0
fence still covers any edge case the sort order misses.
2026-04-24 16:31:10 -04:00
2bcef50ac5 feat(webhooks): circuit breaker auto-disables misbehaving subscriptions
After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed
deliveries, the worker calls trip_webhook_circuit(uuid, ts) which
flips enabled=False and stamps auto_disabled_at. The worker sets its
reload flag so the next dispatch epoch stops consuming events for the
tripped sub entirely — one dead receiver can't poison the shared
egress pool anymore.

Operator clears the trip via PATCH — setting enabled=True when the
sub was previously disabled clears auto_disabled_at, zeros
consecutive_failures, and clears last_error. Admin-pause → re-enable
hits the same path harmlessly.

Three observable states now distinguishable in the UI:
- Active              enabled=True,  auto_disabled_at=NULL
- Admin-paused        enabled=False, auto_disabled_at=NULL
- Tripped             enabled=False, auto_disabled_at=<ts>

UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and
a "N TRIPPED" count in the page header. Hover tooltip tells the
operator how to reset ("Re-enable via Edit").

record_webhook_failure now returns the new consecutive_failures count
so the worker can compare against the threshold without a second
roundtrip. trip_webhook_circuit is idempotent — re-tripping just
re-stamps auto_disabled_at.

Closes THREAT_MODEL WH-02 and DEBT-037 §1.
2026-04-24 16:24:33 -04:00
ee682eef65 feat(web/webhooks): surface manual FIRE button per row
The per-row test-delivery action already existed as an icon-only 
zap in the ACTIONS column — backed by POST /webhooks/{uuid}/test,
which fires a synthetic test.ping event through the normal HMAC-
signed delivery path with retries disabled. Too easy to miss.

Replace the icon-only button with a labeled [ FIRE] violet-accented
button so it reads as an emphasized dev-tool action right next to
edit/delete. Tooltip now spells out the backend endpoint and "fire
a synthetic test event" intent.

No backend change. Widens the actions column to 180px to accommodate
the label.
2026-04-24 16:15:47 -04:00
731063b96e chore(scripts): mock webhook receiver for local DECNET testing
Python stdlib ThreadingHTTPServer that accepts any POST path, optionally
verifies HMAC against --secret / $DECNET_MOCK_SECRET, and pretty-prints
each delivery with topic / event-id / signature status. Pass --fail 503
to exercise the worker's retry/backoff path.

Point a webhook at http://localhost:8765/ and you'll see every delivery
land with color-coded HMAC OK / MISMATCH / UNVERIFIED badges. No deps.
2026-04-24 16:13:59 -04:00
4d10eba7a7 fix(web/webhooks): match LiveLogs page-header convention
The webhooks page used a bespoke .webhooks-header wrapper that didn't
line up with the rest of the dashboard (Fleet / Logs / Swarm all use
the .<page>-root + .page-header + .page-title-group + .actions
pattern). Swapped to that convention:

- .webhooks-root wrapper, matching .logs-root / .fleet-root spacing.
- H1 "WEBHOOKS" in .page-title-group; subtitle shows
  `N CONFIGURED · M ENABLED [· K FAILING] [· L INSECURE]` in
  .page-sub, same voice as the LOGS stream summary.
- Actions (CREATE WEBHOOK, DELETE SELECTED) sit in .actions.
- Table lives in a proper .logs-section shell with a .section-header
  carrying the Webhook icon + "SUBSCRIPTIONS" title.
- All scoped button overrides (violet/alert/warn/ghost) copied from
  the LiveLogs scope so theme switches behave identically.

Also improve error messaging: extractErrorDetail now maps 401 to
"Session expired" and 403 to "Insufficient permissions (admin only)"
instead of falling through to the generic "Failed to load webhooks".
Helps users who hit the page as viewer or with a stale token see why
it failed.
2026-04-24 16:11:20 -04:00
59c405d9e5 feat(web): Webhooks page + ALERTS nav group
New /webhooks admin page with table-based subscription management:
- CREATE WEBHOOK (inline form row — no modal) with simple-event
  checkboxes (AttackerDetail / DeckyStatus / SystemStatus) that
  expand to bus-topic patterns server-side, and an advanced-mode
  textarea for raw NATS-style patterns.
- Bulk-select + DELETE SELECTED with two-click arm pattern.
- Per-row test-ping (zap), pencil edit, and delete actions.
- Last-fired timestamp column.
- Yellow banner surfacing insecure_url warnings (WH-03): http:// is
  allowed but flagged so operators see it on every page load.
- Post-create secret modal — the secret is shown exactly once with
  a COPY button and a clear "won't see this again" notice.

Sidebar nav regrouped: /live-logs and /webhooks now live under a new
ALERTS NavGroup (Bell icon). The alertCount badge rides the Live
Logs sub-item. Command palette gains a "Webhooks" GO TO entry with
the `G W` chord.

Side-fix: useFocusSearch.ts was failing the build under
verbatimModuleSyntax (pre-existing, unrelated). Split the React
import to satisfy tsc; no behavioural change.
2026-04-24 16:03:53 -04:00
c2ff8d1a4f docs(debt): DEBT-037 — webhook delivery guarantees beyond MVP
The webhook MVP shipped with deliberate deferrals; this entry names
them so future PRs know exactly what's left to close: circuit
breaker, dead-letter table, delivery audit log, batch/coalescing,
per-subscription rate limiting, payload templates per destination,
and secret encryption at rest.

Non-negotiable even at MVP scope (HMAC signing, bus-off degraded
mode, jittered retry backoff) is called out explicitly to prevent
future contributors from weakening it under the banner of
"simplification."
2026-04-24 16:03:33 -04:00
638236113d feat(webhooks): non-blocking http:// warning + WH-03 accepted risk
WebhookResponse now carries a `warnings: list[str]` field. When the
subscription's URL starts with http://, an `insecure_url` advisory is
surfaced on every GET/CREATE without blocking the request. HMAC still
detects tampering regardless of transport — only read-confidentiality
is lost over plaintext — and test/dev environments without TLS stay
usable.

Matches the operator-trust posture already established by DA-06
(admin-on-admin protection is out of scope). The alternative — hard
rejection at admin time — was considered and declined; warning-plus-
visibility is the right shape.

THREAT_MODEL WH-03 accepted risk registered; revisit triggers are
multi-admin delegation, a regulated customer, or an operator ticket
asking for a DECNET_WEBHOOK_REQUIRE_HTTPS enforcement knob.
2026-04-24 15:53:30 -04:00
f84bf82f6c docs(webhook): roadmap tick + threat-model component
- DEVELOPMENT.md: tick the "Real-time alerting" roadmap item with a
  note that Slack/Telegram-specific senders remain per-destination
  follow-ups (they accept generic webhook payloads already).
- THREAT_MODEL.md: new Component 2 — DECNET↔External webhook
  destination. DFD, full STRIDE table, WH-01 (secret at rest) and
  WH-02 (half-dead-receiver retry waste) registered as accepted
  risks pointing at DEBT-037 for post-MVP hardening. Checklist lists
  two open items: OpenAPI schema omits `secret`, and http:// URL
  rejection at admin time.
2026-04-24 15:48:14 -04:00
e6127a81a1 feat(webhook): worker + CLI + systemd unit
Introduces the `decnet webhook` long-running worker that consumes the
internal bus and POSTs matching events to configured subscriptions.

Design: one task per (subscription, pattern) pair. Each task opens
its own bus subscription, iterates events, and dispatches via the
shared deliver() client. No intermediate queue, no in-memory filter
matching — the bus's own pattern matcher is the filter. Reloads on
`system.webhook.subscriptions_changed` signals from the CRUD router,
with a 60s fallback timer in case a signal is lost.

Shutdown propagates via CancelledError on the outer task; all inner
subscription tasks are cancelled and awaited in a finally block.
Bus unavailable → worker stays up in idle mode per the DEBT-031
pattern, logging one warning.

Registered as a master-only CLI command (agents don't configure
webhooks — the subscription store lives on master). systemd unit
mirrors the profiler template; added to decnet.target Wants= list so
`systemctl start decnet.target` brings it up alongside everything
else. `decnet init` auto-picks up the new .service.j2 via its
existing `glob("decnet-*.service.j2")` sweep.
2026-04-24 15:46:11 -04:00
b70845a85d feat(webhooks): subscription CRUD + HMAC-signed delivery client
Introduces the webhook egress foundation — a new WebhookSubscription
table, admin-gated CRUD under /api/v1/webhooks, and the shared
delivery client that both the test-ping route and the upcoming worker
will use. No worker yet; this commit is API + model + client only.

Simple-mode enum (AttackerDetail / DeckyStatus / SystemStatus) expands
to bus-topic patterns at the router layer; storage is always the raw
pattern list. Advanced mode lets admins supply raw NATS-style patterns
directly. Filter-at-subscribe: the worker (next commit) will subscribe
to the union of patterns across enabled subscriptions.

Delivery client handles HMAC-SHA256 signing (X-DECNET-Signature),
retry on 429/5xx/network errors with jittered backoff, no-retry on
4xx. Secrets never leave the server on GET/LIST — only the create
response carries the secret for copy-out.

CRUD routes publish WEBHOOK_SUBSCRIPTIONS_CHANGED on the bus after
every mutation so the (future) worker can hot-reload.

Opens DEBT-037 for the deferred items (circuit breaker, dead-letter,
batch delivery, payload templates, secret-at-rest).
2026-04-24 15:30:05 -04:00
162f7c1194 feat(api/sse): per-user connection cap + viewer-safe invariant
New decnet/web/sse_limits.py provides sse_connection_slot, an async
context manager that counts live SSE connections per user UUID and
raises 429 when a per-user cap is exceeded (default 5, override via
DECNET_SSE_MAX_PER_USER). Wired into both SSE generators as their
first async with, so the cap check fires before any stream data is
yielded.

The cap must sit inside the generator — StreamingResponse returns
before the generator body runs, so a handler-level wrapper would
release the slot immediately. Put prefetch + slot + loop all under
the one async with.

Also documents F6/I (role leakage) as mitigated-by-construction via
handler docstrings: every event type on both streams wraps data
already reachable via viewer-gated REST, so no per-event filter is
needed until a new event family is introduced. The invariant is
written into the handler docstrings so a future PR can't silently
add admin-only events.

Resolves THREAT_MODEL F6/I and F6/D.
2026-04-24 15:01:20 -04:00
df84981954 feat(api): pin response_model on dict-returning mutation routes
Every mutation route that returned an untyped dict now declares
response_model at the decorator. MessageResponse covers the eight
{"message": ...} envelopes (change-password, mutate-decky, mutate-
interval, update-deployment-limit, update-global-mutation-interval,
delete-user, update-user-role, reset-user-password). Purpose-built
models cover the richer shapes (DeployResponse for /deckies/deploy,
PurgeResponse for /config/reinit, ReapReportResponse for /reap-orphans,
UserResponse for /config/users). 204-No-Content and Response/
ORJSONResponse routes stay as-is.

The wire shape for clients is unchanged — the envelopes already only
shipped a message field. What changes is that a handler which
accidentally returns a richer dict (e.g. a full user row including
password_hash) would be silently stripped to the declared fields at
serialization time.

Also flips F4/D "expensive LIKE" to accepted (new DA-09) — the /logs
and /attackers search routes LIKE-scan unbounded columns, but both are
admin-gated, limit-capped, and operator rate-limit scope per DA-04.
FTS5 stays a performance TODO, not a security blocker.
2026-04-24 14:27:58 -04:00
a935bf2663 feat(api): cap offset on list-topologies and transcript endpoints
The other five query endpoints (/logs, /attackers, /attacker-commands,
/bounties, /topologies/{id}) already declared le=2147483647 on offset;
these two were inconsistently uncapped. Bring them in line to close
the F4/D deep-pagination row.

Also resolves F4/T (ORM sort injection — already mitigated by the
regex pattern on /attackers sort_by, no other route accepts a column
name) and F4/D (limit cap — already universal) with code pointers.
2026-04-24 14:14:25 -04:00
e53b580767 test(api): RBAC contract test — viewer JWT on every classified route
New test walks app.routes, classifies each APIRoute as admin/viewer/open
by identity-matching require_admin / require_viewer closures inside the
route's dependency tree, then asserts:
  - admin routes return 403 to a viewer JWT
  - viewer routes return neither 401 nor 403 to a viewer JWT
SSE routes skipped (separate scope under F6). Role hints deliberately
NOT encoded in the OpenAPI spec — classification stays server-side so
/openapi.json can't be used to enumerate admin routes.

Resolves THREAT_MODEL F2/I + F5/E; paired with the existing
test_schemathesis.py::test_auth_enforcement (401-half coverage).
2026-04-24 14:00:12 -04:00
99ccd41bb5 feat(api/artifacts): explicit Content-Disposition + X-Content-Type-Options
Harden the attacker-controlled artifact download path (F7) with explicit
response headers instead of relying on Starlette's defaults (which only
emit attachment for non-ASCII filenames and never set nosniff). Also
resolves the THREAT_MODEL F7 path-traversal row (containment check was
already in _resolve_artifact_path) and the fleet-deploy detail=str(e)
audit (all four sites are admin-gated deliberate validator UX or
structured worker-response fields).
2026-04-24 13:24:34 -04:00
ec1079e78b feat(profiler): wire p0f-v2 matcher into sniffer_rollup priority chain
The ~30-signature hand-rolled p0f-lite table in decnet/sniffer/p0f.py
misses most real-world attackers (yesterday's SLOW SCAN being a
textbook case — 9 hours of events, 19 hits, os_guess = NULL). The
375-sig vendored p0f v2 DB was already there; this commit actually
calls it.

New resolution chain in sniffer_rollup:

  1. Enabled OS-fingerprint providers (p0f-v2 default, via
     DECNET_OSFP_PROVIDERS) tried in declared order. Provider with
     highest-confidence match across all enabled sources wins.
  2. Modal os_guess label from the sniffer's hand-rolled p0f.py.
     Kept as fallback because v2's DB predates post-2006 kernels.
  3. TTL bucket (linux / windows / embedded). Coarse but never wrong.

Wiring details:

- _match_via_osfp_providers: never raises — factory / provider
  failures collapse to None and the chain falls through to the
  old modal-label / TTL path. A corrupt .fp file or misconfigured
  DECNET_OSFP_PROVIDERS must never wedge a profile rebuild.
- tcp_fp_context tracks whether the LATEST tcp_fp snapshot came
  from a passive SYN ('syn' → p0f.fp) or an active prober probe
  ('synack' → p0fa.fp). Routes to the right sig list.
- initial-TTL normalisation via decnet.sniffer.p0f.initial_ttl.
  Observation's TTL may be N hops below the OS's initial; v2
  signatures match on the canonical bucket.

Soft-field semantics on Signature.score(): df and total_len are now
skip-checked when the observation is missing them. Sniffer doesn't
currently emit either SD field; a literal-constraint sig
shouldn't hard-reject a match solely because of upstream
incompleteness. Hard fields (window, ttl, options_sig, quirks)
still hard-reject on absent/mismatched input — those are the real
discriminators. Promote df / total_len back to hard the moment the
sniffer starts emitting them.

+2 integration tests on TestSnifferRollup, +2 soft-field tests on
test_signature. Full regression: 166 tests across tests/prober/osfp
+ tests/profiler all green.
2026-04-24 11:56:50 -04:00
8a430bf725 feat(prober/osfp): P0fV2Provider + factory dispatch
- decnet/prober/osfp/p0f/provider.py: P0fV2Provider loads the four
  vendored .fp files into per-context signature lists (syn / synack /
  rst / stray) and matches via highest-specificity score across the
  relevant list. Also auto-picks up p0f-decnet.fp if present (GPL-3.0
  additions land there later, empty for now).
- decnet/prober/osfp/factory.py: get_provider / get_all_providers /
  reset_cache, mirrors decnet/geoip/factory exactly. Env-dispatched
  via DECNET_OSFP_PROVIDERS (default "p0f-v2"). Reserved names
  "nmap-osdb" (pending Fyodor's grant) and "decnet-observed" (our
  future curated DB) raise NotImplementedError — visible on the
  factory surface so a typo doesn't silently fall through.
- decnet/prober/osfp/__init__.py now re-exports the public API so
  callers use `from decnet.prober.osfp import get_provider` without
  reaching into submodules (upholds the provider-subpackage rule).

15 new provider+factory tests covering:
- All four DB contexts load (262/61/46/6 sigs per inventory).
- Known-good Linux 2.6 SYN + Linux 2.2 SYN-ACK match end-to-end.
- Unknown observations / contexts return None, not raise.
- Factory memoises, env override honoured, unsupported names raise.
- Reserved names raise NotImplementedError (not silent None).

`sniffer_rollup` wiring lands in the next commit.
2026-04-24 11:50:46 -04:00
41ff6b4b03 feat(prober/osfp): p0f v2 .fp parser + Signature scoring
First code layer of the OS-fingerprinting work on top of yesterday's
vendored p0f v2 database. Three new modules, all pure (no I/O outside
of the parser's file read):

- decnet/prober/osfp/base.py — Provider protocol + OsMatch dataclass
  matching the established Provider convention in decnet/geoip and
  decnet/bus. Docstring spells out the never-raise invariant: malformed
  input returns None, so a single bad event can't wedge a whole
  attacker-profile rebuild.

- decnet/prober/osfp/p0f/signature.py — Signature dataclass + three
  predicate helpers (WindowSpec / IntSpec / OptionToken) encoding the
  p0f v2 DSL's wildcard / modulo / MSS-multiple / MTU-multiple
  semantics. Scoring is our extension on top of upstream p0f's
  first-match-wins policy: each signature carries a precomputed
  specificity in [0, 1] so the factory can pick the most-specific
  match when multiple signatures fire against one observation.

- decnet/prober/osfp/p0f/format.py — .fp line parser. Every shipped
  field variant from the DSL spec at the top of p0f.fp is covered
  (Snn / Tnn / %nnn / * for window; T0 vs T; -/@/* os-genre prefixes;
  quirks as concatenated single-letter flags; '.' sentinels for
  no-options / no-quirks). Malformed lines log a warning and skip
  instead of aborting the whole file — 1 bad row must not cost the
  other 374.

20 parser tests + 14 scoring tests. Full vendored-DB smoke tests
confirm all 375 signatures parse round-trip (262 SYN + 61 SYN-ACK +
46 RST + 6 stray) and every computed specificity lands in [0, 1].
2026-04-24 11:47:54 -04:00
620e1f5b1d feat(prober): vendor p0f v2 TCP/IP fingerprint database (LGPL-2.1 → GPLv3 via §3)
Ships the p0f v2.0.8 signature database for passive + active OS
fingerprinting. 375 total signatures across four probe contexts:

- p0f.fp  (262 sigs) — passive SYN fingerprints
- p0fa.fp ( 61 sigs) — SYN-ACK response, for active probes
- p0fr.fp ( 46 sigs) — RST response quirks
- p0fo.fp (  6 sigs) — "stray" packet fingerprints

Replaces reliance on the 10-signature hand-rolled p0f-lite table in
decnet/sniffer/p0f.py for any match job the upstream DB covers.
Keeping the hand-rolled table as a fallback for modern kernels the
v2 DB pre-dates — v2 froze in 2006 so post-Win10 / post-Linux-3.x
kernels won't match against upstream directly. DECNET-authored
additions will go in a sibling p0f-decnet.fp under GPLv3 (not yet
committed; added as the ingester observes real honeypot traffic).

Provenance (full chain in data/README.md):

- Source: Debian snapshot of p0f_2.0.8.orig.tar.gz
- SHA1 matches Debian-recorded 7b4d5b2f24af4b5a299979134bc7f6d7b1eaf875
- Files byte-identical to upstream tarball (verified by hash)

License chain:

- Upstream: LGPL-2.1 (doc/COPYING preserved verbatim as
  data/LICENSE.p0f-upstream, Michal Zalewski's copyright intact).
- DECNET uses the LGPL-2.1 §3 explicit permission to convert to any
  version of the GPL. These files, as consumed in DECNET, are
  effectively GPL-3.0. Chain documented in data/README.md so an
  auditor sees the full reasoning.
- LGPL-2.1 → GPL-3.0 §3 conversion is a settled compat path; same
  mechanism the kernel uses for LGPL userland glue and many other
  projects apply daily.

Rejected path — nmap-os-db under NPSL — because NPSL adds
restrictions GPLv3 §7 prohibits us from accepting. An email is out
to Fyodor requesting an open-source-author exception grant, but we
don't block on it: p0f v2 is a genuine accuracy improvement in
its own right, and adding nmap-osdb later (if granted) plugs into
the same provider interface with zero refactor.

Directory layout mirrors the established provider-subpackage pattern
(see decnet/geoip/, decnet/bus/) per the feedback_provider_
subpackages memory: base + factory + impl/ subpackages, no flat
files. Parser + matcher + factory wiring land in the next commit
sequence.
2026-04-24 11:39:33 -04:00
011445b77a chore(license): add GPL-3.0-or-later LICENSE + pyproject metadata
DECNET had no LICENSE file and no license metadata in pyproject.toml
despite intent being GPLv3. Legally that meant the code was "all
rights reserved" by default, so anyone distributing it (including via
GitHub clones, mirrors, or the forthcoming swarm enroll bundles) was
technically in violation even though the operator's own intent was
copyleft.

- Add canonical GPL-3.0 text from gnu.org/licenses/gpl-3.0.txt as
  LICENSE (verbatim, 674 lines).
- Add license = "GPL-3.0-or-later" and license-files = ["LICENSE"]
  to pyproject.toml [project] (SPDX identifier per PEP 639).
- Add the matching OSI classifier plus a few other standard ones
  (Python 3.11, Linux, Security, Network Monitoring, Beta) that
  pyproject was silently missing.

Prereq for the forthcoming p0f-db vendoring: establishing DECNET's
own license explicitly closes the first question an auditor would
ask about any third-party data we embed.
2026-04-24 11:35:59 -04:00