Commit Graph

876 Commits

Author SHA1 Message Date
8033137be6 dev(ci): simplified pipeline
All checks were successful
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Merge testing → main (push) Has been skipped
CI / Lint (ruff) (push) Successful in 14s
CI / SAST (bandit) (push) Successful in 22s
CI / Dependency audit (pip-audit) (push) Successful in 25s
CI / Merge dev → testing (push) Successful in 10s
2026-04-28 18:16:26 -04:00
17097cc3dc chores(order): moved some markdown files to the development folder 2026-04-28 17:56:29 -04:00
89268f19fb dev(ci): modified ruff call to not check dev scripts/
All checks were successful
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 20s
CI / Dependency audit (pip-audit) (push) Successful in 23s
CI / Test (Standard) (3.11) (push) Successful in 13m27s
CI / Test (Live) (3.11) (push) Successful in 1m17s
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
CI / Merge dev → testing (push) Successful in 11s
2026-04-28 17:47:28 -04:00
150e5eb5be docs(readme): add Ko-Fi button
Some checks failed
CI / Lint (ruff) (push) Failing after 16s
CI / SAST (bandit) (push) Successful in 24s
CI / Dependency audit (pip-audit) (push) Successful in 27s
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-28 16:15:02 -04:00
f2e01d8ea6 dev(ci): modified ci to not run fuzzing tests; done locally 2026-04-28 16:13:32 -04:00
15b2e7ba5c refactor(db): split credentials.py into a credentials/ subpackage
Splits the 459-line credentials.py into two submixins plus a composing
CredentialsMixin in credentials/__init__.py:

  _core.py    (~190)  Credential capture: upsert, list, filters,
                      per-attacker / per-secret reads, attacker_uuid
                      backfill
  reuse.py    (~270)  CredentialReuse correlation: upsert, candidate
                      mining, list/get + the _enrich_with_secret helper
                      that lifts the printable/b64 from underlying rows

_merge_unique stays with reuse.py (its only caller).
_enrich_with_secret stays with reuse.py — it's an internal helper of
list_credential_reuses / get_credential_reuse_by_id, never called
from the capture path.
2026-04-28 16:05:57 -04:00
3d00de8fd3 refactor(db): split attackers.py into an attackers/ subpackage
Splits the 494-line attackers.py into five submixin files plus a
composing AttackersMixin in attackers/__init__.py:

  _core.py       (~95)   Attacker CRUD + _deserialize_attacker
  behavior.py    (~110)  AttackerBehavior + _deserialize_behavior
  sessions.py    (~50)   SessionProfile read/write
  smtp.py        (~70)   SmtpTarget per-attacker + cross-attacker views
  activity.py    (~190)  log-derived activity (commands, leaks,
                         artifacts, stored mail, session log, transcripts)

IdentitiesMixin.list_observations_for_identity calls
self._deserialize_attacker; MRO resolves it onto AttackersCoreMixin
through the composed SQLModelRepository class.
2026-04-28 15:46:28 -04:00
5e7d68fde3 refactor(db): split topology.py into a topology/ subpackage
Splits the 694-line topology.py into five submixin files plus a
composing TopologyMixin in topology/__init__.py:

  _core.py       (~225)  topologies CRUD + _assert_pending /
                         _check_and_bump_version concurrency guards
  lans.py        (~115)  LAN CRUD
  deckies.py     (~130)  topology decky CRUD + list_running_topology_deckies
  edges.py        (~80)  edge CRUD + status-event log
  mutations.py   (~165)  live reconciler queue (atomic claim + state writes)

Sibling submixins call self._assert_pending and
self._check_and_bump_version; MRO resolves them onto TopologyCoreMixin
through the composed SQLModelRepository class.
2026-04-28 15:16:42 -04:00
20e89eb0a6 refactor(db): extract TopologyMixin
Moves the 31 MazeNET topology methods (topologies CRUD, LANs, deckies,
edges, status events, mutation queue) into sqlmodel_repo/topology.py.
Includes _assert_pending and _check_and_bump_version concurrency
guards.

This is the last domain extraction; sqlmodel_repo/__init__.py is now
~165 lines: lifecycle (initialize/reinitialize/migrations), the admin
self-heal seed, get_state/set_state, and the mixin composition.
2026-04-28 15:11:14 -04:00
7483d01311 refactor(db): extract IdentitiesMixin and CampaignsMixin
Splits the AttackerIdentity and Campaign clustering reads/writes into
sqlmodel_repo/identities.py and sqlmodel_repo/campaigns.py.

Both call _deserialize_attacker (identities only) which resolves
through AttackersMixin via MRO.
2026-04-28 15:07:39 -04:00
912171d053 refactor(db): extract AttackersMixin
Moves the 19 attacker-domain methods (core CRUD, behavior, sessions,
smtp targets, log-derived activity views) plus the _deserialize_attacker
and _deserialize_behavior helpers into sqlmodel_repo/attackers.py.
2026-04-28 15:04:51 -04:00
7ba8bafcaa refactor(db): extract CredentialsMixin
Moves the 12 credential and credential-reuse methods (incl. the
_merge_unique and _enrich_with_secret helpers) into
sqlmodel_repo/credentials.py.
2026-04-28 15:00:04 -04:00
5b1af331b9 refactor(db): extract CanaryMixin
Moves the 13 canary blob/token/trigger methods into
sqlmodel_repo/canary.py.
2026-04-28 14:55:52 -04:00
03b3c8855c refactor(db): extract OrchestratorMixin
Moves the 9 orchestrator event/email log + prune methods into
sqlmodel_repo/orchestrator.py.
2026-04-28 14:54:20 -04:00
555cd13f09 refactor(db): extract RealismMixin
Moves the 8 synthetic-file + realism-config methods into
sqlmodel_repo/realism.py.
2026-04-28 14:52:59 -04:00
9b845269c9 refactor(db): extract LogsMixin
Moves the 8 log methods (incl. get_stats_summary aggregator) into
sqlmodel_repo/logs.py. get_log_histogram remains an abstract dialect
override point; sqlite/mysql subclasses still override it via MRO.
2026-04-28 14:51:35 -04:00
a0aeba5abc refactor(db): extract FleetMixin and promote JSON helpers
Moves the 6 fleet-decky methods (incl. cross-source list_running_deckies
aggregator) into sqlmodel_repo/fleet.py. _serialize_json_fields and
_deserialize_json_fields move to _helpers.py since they're shared
across fleet, topology, and canary.
2026-04-28 14:50:01 -04:00
d989cd0461 refactor(db): extract WebhooksMixin
Moves the 9 webhook-subscription methods (CRUD + delivery
bookkeeping) into sqlmodel_repo/webhooks.py.
2026-04-28 14:47:42 -04:00
167f140b0e refactor(db): extract BountiesMixin
Moves the 5 bounty methods plus the cross-table purge_logs_and_bounties
helper into sqlmodel_repo/bounties.py.
2026-04-28 14:46:39 -04:00
c6804d79b6 refactor(db): extract DeckiesMixin
Moves the 4 decky-shard CRUD methods into sqlmodel_repo/deckies.py.
2026-04-28 14:45:15 -04:00
eebf9e4c97 refactor(db): extract AuthMixin
Moves the 7 user CRUD methods into sqlmodel_repo/auth.py.
_ensure_admin_user stays in __init__.py so DECNET_ADMIN_PASSWORD
remains addressable at the module path tests already monkeypatch.
2026-04-28 14:43:49 -04:00
99adbebe75 refactor(db): extract SwarmMixin
Moves the 7 swarm-host CRUD methods into sqlmodel_repo/swarm.py.
2026-04-28 14:42:58 -04:00
85c914e754 refactor(db): extract AttackerIntelMixin
Moves upsert_attacker_intel, get_attacker_intel_by_uuid,
and get_unenriched_attackers into sqlmodel_repo/attacker_intel.py.
Composed onto SQLModelRepository via mixin inheritance.
2026-04-28 14:40:36 -04:00
e16f47ad24 refactor(db): extract _safe_session/_detach_close to _helpers.py
Module-level session helpers move into sqlmodel_repo/_helpers.py.
__init__.py re-exports them so external import paths
(decnet.web.db.sqlmodel_repo._safe_session) keep resolving.
2026-04-28 14:38:26 -04:00
4167345d51 refactor(db): convert sqlmodel_repo.py to a package
Pure rename — the old monolithic 3505-line file becomes
decnet/web/db/sqlmodel_repo/__init__.py. No code changes.
Subsequent commits will extract per-domain mixins out of __init__.py
to mirror the topical layout used by decnet/web/db/models/.
2026-04-28 14:37:18 -04:00
6d8c90777d chore: remove vulture-flagged dead code, add whitelist
- plain.py: drop `or True` short-circuit + unreachable return; drop now-unused _HASH_HINTS
- ingester.py: drop unused `current_position` param from _flush_batch
- vulture_whitelist.py: document remaining false positives (FastAPI Depends side-effects, IMAP uid_mode where UID==seq)
2026-04-28 14:30:12 -04:00
b994250ef6 dev(ci): added CVE-2026-3219 to ignore vulns; no fix is yet available
Some checks failed
CI / Lint (ruff) (push) Successful in 13s
CI / SAST (bandit) (push) Successful in 21s
CI / Dependency audit (pip-audit) (push) Successful in 34s
CI / Test (Standard) (3.11) (push) Successful in 13m21s
CI / Test (Live) (3.11) (push) Successful in 1m45s
CI / Test (Fuzz) (3.11) (push) Failing after 3h1m24s
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
2026-04-28 14:24:57 -04:00
b4adc7246f fixed: deleted line from pyproject
Some checks failed
CI / Lint (ruff) (push) Successful in 17s
CI / SAST (bandit) (push) Successful in 24s
CI / Dependency audit (pip-audit) (push) Failing after 32s
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-28 13:03:14 -04:00
674ac7dd13 test(db): cover BaseRepository.update_identity_fingerprints
DummyRepo couldn't instantiate — TLS-cert fingerprint rollup added a new
abstract method without a stub here. Add the override and a call site so
the abstract pass body is hit.
2026-04-28 13:01:37 -04:00
cc6abf7256 fix(tests/stress): eliminate 0-request flakes in locust runs
Three independent issues conspired to make stress tests record 0 requests:

1. Every virtual user did /auth/login in on_start. With 1000 users in a
   spike window, bcrypt-bound logins never finished and on_start failed
   for all users — aggregated requests stayed at 0. Pre-fetch a single
   admin token in the fixture (cached per-host) and pass it via
   DECNET_STRESS_TOKEN so locust users skip the login storm.

2. Locust exits non-zero on any request failure by default, causing
   run_locust to throw away an otherwise valid stats CSV. Pass
   --exit-code-on-error 0 so per-test assertions are the only fail gate.

3. test_stress_sustained ran two locust subprocesses against the same
   uvicorn. Phase 1's keep-alive connections wedged phase 2 into 0
   recorded requests ~2/3 of the time. Refactored stress_server into a
   start_stress_server() context manager and gave each phase its own
   uvicorn.

Stable 3/3 on full suite, 3/3 on test_stress_sustained alone.
2026-04-28 13:01:11 -04:00
681931d9bb docs(roadmap): tick certificate details and three sibling roadmap items 2026-04-28 11:41:17 -04:00
72cc928ebf feat(prober-cert): roll up fingerprints onto AttackerIdentity
Brings the federation-gossip columns on AttackerIdentity to life —
ja3_hashes, hassh_hashes, and the new tls_cert_sha256 — by projecting
the union of every member observation's fingerprints JSON onto the
identity at clusterer create / link / merge time.

- decnet/profiler/identity_rollup.py: pure extract_fp_summaries()
  reads the production bounty shape (payload.fingerprint_type +
  payload.{ja3,hash,cert_sha256}) and returns deduped+sorted JSON
  list[str] per family, or None when a family has no signal so the
  column stays NULL instead of '[]'.
- BaseRepository.update_identity_fingerprints + SQLModel impl: one
  idempotent write that overwrites the three summary columns and
  bumps updated_at.
- ConnectedComponentsClusterer: after every per-component
  reconciliation (fresh-create OR existing-merge+link), recomputes
  and writes the rollup for the target identity. Wrapped in a
  best-effort helper so a write failure logs but never breaks the
  tick.
- Tests: extract_fp_summaries unit (dedup, sort determinism,
  unknown types ignored, malformed JSON, nested-stringified
  payloads, non-string values); end-to-end clusterer ticks
  populate the columns on create + on later observation links;
  no-fingerprint clusters keep the columns NULL.
2026-04-28 11:28:54 -04:00
9ab43b4ea4 feat(prober-cert): UI for active TLS certificate captures
- FpCertificate renders the new cert_sha256 field (truncated, with
  full hash on hover) and a FROM line carrying the prober-side
  target_ip/port so the source is visible.
- tls_certificate payloads split on target_ip presence: prober certs
  land under ACTIVE PROBES, sniffer certs under PASSIVE FINGERPRINTS.
  Two synthetic fpType keys (tls_certificate_active /
  tls_certificate_passive) drive the bucketing without disturbing
  the on-the-wire fingerprint_type.
2026-04-28 11:23:34 -04:00
5f8149daee feat(prober-cert): capture leaf TLS cert after successful JARM
JARM probes are crafted ClientHellos with weird ciphers — they never
complete a real handshake, so the peer cert isn't reachable from
those sockets. After a non-empty JARM hash proves the port speaks
TLS, do a separate ssl.wrap_socket() against the same (ip, port) to
fetch and parse the leaf cert.

- decnet/prober/tlscert.py: fetch + parse via cryptography lib;
  swallows all connect/handshake/parse failures (returns None).
- decnet/prober/worker.py::_capture_tls_cert: emits a tls_certificate
  event with subject_cn / issuer / SANs / validity / SHA-256 +
  publishes on the bus. Wired from _jarm_phase only when JARM
  succeeds, so non-TLS ports never trigger a second connect.
- Tests cover happy path, cert-fetch failure, defense-in-depth crash,
  empty-JARM skip, publish_fn, and parser edge cases (garbage DER,
  empty bytes, missing SAN extension, non-self-signed).
2026-04-28 11:14:44 -04:00
4749c972e5 feat(prober-cert): schema for active TLS cert capture
Adds storage for TLS certificate details collected from attacker-run
servers by the active prober (sibling to the existing JARM probe).

- AttackerIdentity.tls_cert_sha256 / Campaign.tls_cert_sha256:
  JSON list[str] columns mirroring ja3_hashes / hassh_hashes for
  federation gossip.
- ingester clause 9b: emits a 'tls_certificate' fingerprint bounty
  when a prober event carries subject_cn (disjoint from the existing
  sniffer-gated clause).
- Prober-side capture (ssl.wrap_socket follow-up after JARM) and
  profiler rollup land in sibling commits.
2026-04-28 11:09:25 -04:00
e986e81421 fix(test-schemathesis): drop unsupported_method check
The check expects 405 for any HTTP method not declared on a path.
DECNET's topology router has a static `/topologies/services` (GET only)
sibling to a parameterized `/topologies/{topology_id}` (DELETE), so a
DELETE on the static path falls through to the parameterized route and
hits auth, which returns 401 — by design. Leaking 405-vs-401 would let
unauthenticated callers enumerate valid topology UUIDs.

The same shape applies to other static/dynamic sibling pairs across
the API. The check is fundamentally incompatible with that routing
strategy; document the omission inline.
2026-04-28 10:20:43 -04:00
ccc8619387 fix(test-schemathesis): disable rate limiter in fuzz subprocess
Schemathesis fires up to 3000 examples per endpoint. POST /auth/login
caps at 10/5min per IP, so the second example onward returns 429 and
the positive_data_acceptance check flags it as RejectedPositiveData
(its allowed-status list is hardcoded in schemathesis to
2xx/401/403/404/409/5xx, so OpenAPI tweaks can't fix it).

DECNET_LIMITER_ENABLED=false exists for exactly this case (see
limiter.py docstring on stress/load testing).

Reverts the custom_openapi shim from 5d88346 / 9b1168c — the endpoint
already declares 429 in its responses= map (api_login.py:38), and the
shim turned out to address a problem that wasn't there. Drop the
companion test along with it.
2026-04-28 09:51:49 -04:00
9b1168ce0b fix(api): scope 429 OpenAPI injection to rate-limited routes
Previous commit advertised 429 on every operation. Only routes
decorated with @limiter.limit can actually return slowapi's 429 —
currently just POST /api/v1/auth/login. Documenting it elsewhere is
dishonest and would mislead clients into expecting a response the
server cannot produce.

Walk slowapi's _route_limits / _dynamic_route_limits registries to
identify decorated endpoints, match them to FastAPI routes by
{module}.{name}, and only inject 429 on those.

Existing per-route 429 declarations (e.g. SSE connection-cap on
events streams via sse_limits) are untouched.
2026-04-28 01:00:34 -04:00
5d883466a2 fix(api): advertise 429 on every operation in OpenAPI
SlowAPI middleware can short-circuit any request with 429 once a
per-route or per-IP rate limit fires (e.g. POST /api/v1/auth/login is
capped at 10/5min). The OpenAPI spec did not declare 429 on any
operation, so schemathesis flagged legitimate rate-limit responses as
RejectedPositiveData / status-code-nonconformance failures.

Override app.openapi to inject a generic 429 response object on every
HTTP operation in the generated schema. Add a contract test that fails
if any operation drops the 429 advertisement.
2026-04-28 00:58:37 -04:00
6b407e8c9c fix(tests): align stale tests with current behavior
- swarm/test_swarm_api, swarm/test_heartbeat: replace deprecated
  asyncio.get_event_loop().run_until_complete() with asyncio.run();
  the former raises in 3.11 once another test has set+closed a loop on
  the main thread.
- prober/test_prober_bus, prober/test_prober_worker: extend tcp_fingerprint
  mocks with tos/dscp/ecn/server_isn so the worker doesn't KeyError into
  the prober_error branch.
- services/test_service_isolation: collector now retries on event-stream
  errors instead of exiting; assert it stays running and cancel cleanly.
- live/test_imap_live, live/test_pop3_live: log format emits
  outcome="failure", not "failed".
- live/test_service_isolation_live: is_service_container accepts label
  OR state-name; rewrite the empty-state test against a synthetic
  unlabeled container instead of the host's real fleet.
2026-04-28 00:44:40 -04:00
8344b539c8 fix(ssh-template): drop sshd/pam_unix native chatter at rsyslog
OpenSSH's native syslog ("Failed password", "Connection from",
"Connection closed by …") and the pam_unix lines emitted from sshd's
PAM stack add no signal beyond what auth-helper already captures as
structured login_attempt events. They cluttered the dashboard and
arrived without an SD wrapper, forcing prose-IP heuristics in the
collector.

Add a `:programname, isequal, "sshd" stop` rule above the forwarding
actions in /etc/rsyslog.d/50-journal-forward.conf. pam_unix lines from
sshd inherit programname=sshd so the same rule covers both. sudo /
login / su pam_unix lines keep flowing (different programname), so
post-login privilege escalation telemetry is preserved.
2026-04-27 23:26:53 -04:00
9350ce195a fix(collector,correlation): extract attacker IP from sshd/pam free-form prose
Native sshd and pam_unix lines route through rsyslog without the
relay@55555 SD wrapper and without key=value pairs, so attacker_ip
fell through to "Unknown". Add a prose-IP fallback to both parsers:
anchored patterns (from/rhost/client/src) win first so we never pick
the local listener in "Connection from X port Y on Z port 22", with
a bare-IPv4 scan as the last resort.
2026-04-27 23:16:42 -04:00
3c571cce5a fix(correlation): prober events no longer count as attacker traversal
The prober writes events with hostname=decnet-prober and target_ip=
<the attacker being fingerprinted>. The parser pulls target_ip into
attacker_ip (it's one of _IP_FIELDS), which is correct for indexing
fingerprints under the attacker — but it had a side effect: every
fingerprinted attacker had two distinct deckies on file (the real
decoy they touched + decnet-prober) and the correlation engine's
traversals() classified that as lateral movement. Live dashboard
showed bogus "dmz-gateway -> decnet-prober" paths and TRAVERSAL
badges on attackers who'd done nothing but knock on the front door.

The prober is internal infrastructure, not a hop. Filter the
"decnet-" namespace out of distinct-decky counts and hop paths in
the engine. Fingerprints stay attached to the attacker profile via
the existing per-IP event index — just no longer as traversal.
2026-04-27 23:02:23 -04:00
e03a6d10a0 fix(collector): retry on event-stream errors and add periodic reconciler
Hit live on first VPS deploy: a window between the initial
client.containers.list() snapshot and the client.events() start-event
stream let topology service containers slip through, requiring an
operator restart for them to be picked up.

Two fixes:

* `_watch_events` now wraps the events() call in a retry loop with
  exponential backoff (1s -> 30s cap). A docker.errors.APIError, daemon
  reload, or SDK stream-decode hiccup used to make the executor task
  return cleanly, leaving the collector "running" with no event
  subscription. Future container starts were silently dropped until
  the unit was restarted.

* New `_reconcile_loop` async task ticks every
  DECNET_COLLECTOR_RECONCILE_S (default 30s), re-scans
  client.containers.list(), and calls _spawn for any service container
  not already in `active`. Belt to the event watcher's suspenders:
  even if a start event is dropped during a reconnect window, the
  reconciler picks it up within one cycle. Also prunes finished
  futures from `active` so the dict's bounded by current container
  count rather than agent lifetime churn.
2026-04-27 22:56:13 -04:00
c5db1d7ba2 fix(config-ini): strip inline # and ; comments from values
The module docstring teaches inline comments — `mode = master    # or
"agent"` is the canonical example for the [decnet] section. Python's
configparser ignores those by default unless inline_comment_prefixes
is set explicitly, so the comment became part of the value and
downstream validators rejected it ("mode must be 'agent' or 'master',
got 'master                       # or \"agent\"'").

Hit live on first VPS deploy: every CLI invocation crashed at import
time with a stack trace that didn't make it obvious the docstring's
example was the trigger. Now the parser does what the docs promise.
2026-04-27 22:55:58 -04:00
0b1a17b4eb fix(agent): pass --always-recreate-deps so service netns shares stay fresh
Decky service containers join their base via `network_mode:
container:<base>` and Docker binds that share at service start time. If
`docker compose up` recreates a base (e.g. ports: changes after a
forwards_l3 toggle) but decides services are unchanged, services keep
a stale FD into the destroyed namespace and end up with only `lo` — so
external traffic hits a closed port on the live base and gets RST.

Hit live on the first VPS deploy: external SSH to the dmz-gateway was
refused while sshd was listening, because base and service netns
inodes had drifted apart. `--always-recreate-deps` makes compose
rebuild every dependent whenever its base is recreated, removing the
race entirely.
2026-04-27 22:55:48 -04:00
0a525ebd37 fix(web): proxy follows DECNET_API_HOST instead of hardcoding 127.0.0.1
The dashboard's /api/* proxy hardcoded 127.0.0.1 as the target host.
That works when the API binds to a wildcard or to loopback, but
breaks the moment an operator binds the API to a specific address —
e.g. a Tailscale IP for tailnet-only deploys: the API stops listening
on loopback entirely and the proxy gets ECONNREFUSED on every request.

The web command now reads DECNET_API_HOST and falls back to loopback
only when the API is on a wildcard (0.0.0.0 / :: / unset). A new
--api-host flag overrides at the CLI level.
2026-04-27 22:55:25 -04:00
673bc5b819 ops(init): ship logrotate config so /var/log/decnet can't fill the disk
Without rotation, the syslog listener and per-host collector grow
/var/log/decnet/ without bound — a noisy attacker (or an active
probe storm) fills the disk in hours on a small VPS. New
deploy/logrotate.d/decnet caps at 7 daily rotations or 100 MiB,
whichever comes first, and uses copytruncate because the ingester
and forwarder hold the files open via Python and won't reopen on
a rename rotation.

Wire install / remove into `decnet init` and `decnet init --deinit`
alongside the existing tmpfiles.d / polkit handling.
2026-04-27 21:26:13 -04:00
5415e98458 sec(api): mode-gate and eager-load JWT secret in lifespan
Refuse to start decnet.web.api when DECNET_MODE=agent (unless the
operator explicitly opts into dual-role with DECNET_DISALLOW_MASTER=
false). The Typer CLI already hides master-only commands on agents,
but a misconfigured systemd unit or a direct uvicorn invocation
would bypass that — now the lifespan itself refuses, before any
worker, DB or bus comes up.

Resolve DECNET_JWT_SECRET eagerly at startup so a missing or known-
bad value fails at boot rather than on the first auth-gated request.
The lazy-load shape stays useful for non-master CLIs.
2026-04-27 21:26:03 -04:00
1a7da33375 sec(env): refuse to start master API with footgun public-binding config
Add validate_public_binding() called from the master API lifespan: when
DECNET_API_HOST is non-loopback, refuse to start if DECNET_CORS_ORIGINS
still contains a loopback origin (catches the "operator flipped to
0.0.0.0 to make it work and forgot to update CORS" footgun) or if
DECNET_CANARY_HTTP_BASE is plaintext http:// to a non-loopback host.
Log CRITICAL when DECNET_LIMITER_ENABLED=false on a public binding.
The validator no-ops under pytest so unrelated suites don't trip on it.

Add DECNET_VERIFY_HOSTNAME env knob; AgentClient and UpdaterClient
consult it when verify_hostname is None, giving production deploys
TLS hostname verification on top of the existing CA + fingerprint pin.
Default off so dev enrollments with mismatched SANs keep working.
2026-04-27 21:15:15 -04:00