26 Commits

Author SHA1 Message Date
f2b3393669 chore: relicense to AGPL-3.0-or-later and add SPDX headers
Replaces LICENSE (GPLv3 -> AGPLv3) and prepends
`SPDX-License-Identifier: AGPL-3.0-or-later` to every source file
across decnet/, decnet_web/, tests/, scripts/, and tools/.

Rationale: closes the GPLv3 ASP loophole so any party operating a
modified DECNET as a network service must offer their modified
source. Personal copyright (Samuel Paschuan) + inbound=outbound
contributions make a future unilateral relicense infeasible.

- LICENSE: full AGPL-3.0 text (gnu.org/licenses/agpl-3.0.txt)
- COPYRIGHT: project copyright notice
- tools/add_spdx_headers.py: idempotent header injector
  (shebang- and PEP 263-aware)

Touches 1565 source files (.py, .ts, .tsx, .js, .jsx, .css, .sh).
No behavior change; comments only.
2026-05-22 21:04:16 -04:00
1b90048715 fix(api): /deckies/deploy becomes additive by default
The wizard POSTs only the new decky on each submit. The handler used to
treat every INI as the complete desired fleet (config.deckies = INI) so
the reconciler tore down prior deckies as orphans — deploying a second
Windows workstation silently wiped the first.

Add replace_fleet to DeployIniRequest (default false). Default path
merges new deckies into existing config and rejects name/IP collisions
with 409. replace_fleet=true preserves set-desired-state semantics for
CLI / declarative callers. Lifecycle rows are created only for the
deckies submitted in the current call, so /deckies/lifecycle?ids=...
reflects exactly what this submit deployed.

build_deckies_from_ini gains reserved_ips so additive auto-allocation
skips IPs already held by the existing fleet.
2026-05-22 18:14:50 -04:00
eacac9aa60 feat(api): GET /deckies/lifecycle + master startup sweep
GET /deckies/lifecycle?ids=<uuid>&ids=<uuid> returns the matching
DeckyLifecycle rows so the wizard can poll instead of holding an HTTP
request open across compose work. require_viewer gating -- read-only.

Startup sweep: on master boot, any pending/running row with
started_at older than 1h flips to failed with
error='master restarted during operation'. Pre-v1 substitute for a
durable task queue: if the master crashes mid-deploy, the wizard sees
FAILED on refresh and the operator retries. Idempotent + cheap; runs
unconditionally including in contract-test mode.
2026-05-22 16:44:17 -04:00
4743c8f733 feat(api): /deckies/deploy and /mutate become 202 fire-and-forget
This is the unblock for the wizard hang. Both endpoints used to run
docker compose synchronously inside the HTTP handler -- on master
(unihost) or via asyncio.gather of worker /deploy POSTs at 600s
timeout each (swarm) -- blocking every other API request.

New flow:
  1. Commit the new config shape to repo state (fast).
  2. Create one DeckyLifecycle row per decky (status=pending).
  3. Spawn asyncio.create_task(run_deploy / run_mutate) -- the
     lifecycle runner drives rows through running -> succeeded|failed
     and emits decky.<name>.lifecycle on the bus.
  4. Return 202 with {lifecycle_ids: [...]}. Wizard polls
     GET /deckies/lifecycle?ids=... (next commit).

mutator/engine.py gains pick_new_services() -- shared between the
async API path and the watch-loop's synchronous mutate_decky().

DeployResponse grows lifecycle_ids[]. The old dispatch_decnet_config
helper still exists for the CLI swarm-deploy command path; it just
isn't called from the API handler anymore.

Test changes: 200 -> 202, drop dispatch_decnet_config mocks (handler
no longer calls it), assert lifecycle_ids in response + committed
state matches expectations.
2026-05-22 16:40:55 -04:00
0f90dcfd3e fix(types): T3 — narrow str|None at 12 sites; fix LANRow/DeckyRow subscript in mutator tests 2026-05-01 02:18:53 -04:00
8814902999 docs(api): clarify fleet_deckies + JSON dual-write happens in engine.deployer
The unihost API path delegates to engine.deployer.deploy(), which now
writes both decnet-state.json (existing) and the fleet_deckies DB
table (added in 646aeec).  Comment makes the single-sink design
explicit so future maintainers don't add a parallel save_state /
upsert_fleet_decky call here.

No behavioral change — every fleet-creation path on every host (CLI
deploy, this unihost API path, and per-worker SWARM agent deploys)
already routes through the engine.deployer single sink.
2026-04-26 21:08:44 -04:00
df84981954 feat(api): pin response_model on dict-returning mutation routes
Every mutation route that returned an untyped dict now declares
response_model at the decorator. MessageResponse covers the eight
{"message": ...} envelopes (change-password, mutate-decky, mutate-
interval, update-deployment-limit, update-global-mutation-interval,
delete-user, update-user-role, reset-user-password). Purpose-built
models cover the richer shapes (DeployResponse for /deckies/deploy,
PurgeResponse for /config/reinit, ReapReportResponse for /reap-orphans,
UserResponse for /config/users). 204-No-Content and Response/
ORJSONResponse routes stay as-is.

The wire shape for clients is unchanged — the envelopes already only
shipped a message field. What changes is that a handler which
accidentally returns a richer dict (e.g. a full user row including
password_hash) would be silently stripped to the declared fields at
serialization time.

Also flips F4/D "expensive LIKE" to accepted (new DA-09) — the /logs
and /attackers search routes LIKE-scan unbounded columns, but both are
admin-gated, limit-capped, and operator rate-limit scope per DA-04.
FTS5 stays a performance TODO, not a security blocker.
2026-04-24 14:27:58 -04:00
8a2876fe86 fix(api): document missing HTTP status codes on router endpoints
All checks were successful
CI / Lint (ruff) (push) Successful in 16s
CI / SAST (bandit) (push) Successful in 18s
CI / Dependency audit (pip-audit) (push) Successful in 26s
CI / Test (Standard) (3.11) (push) Successful in 2m41s
CI / Test (Live) (3.11) (push) Successful in 1m6s
CI / Test (Fuzz) (3.11) (push) Successful in 1h9m14s
CI / Finalize Merge to Main (push) Has been skipped
CI / Merge dev → testing (push) Successful in 12s
CI / Prepare Merge to Main (push) Has been skipped
Schemathesis was failing CI on routes that returned status codes not
declared in their OpenAPI responses= dicts. Adds the missing codes
across swarm_updates, swarm_mgmt, swarm, fleet and attackers routers.

Also adds 400 to every POST/PUT/PATCH that accepts a JSON body —
Starlette returns 400 on malformed/non-UTF8 bodies before FastAPI's
422 validation runs, which schemathesis fuzzing trips every time.

No handler logic changed.
2026-04-20 15:25:02 -04:00
af9d59d3ee fixed(api): documentation 2026-04-20 13:20:42 -04:00
07ec4bc269 fix(fleet): INI fully replaces prior decky state on redeploy
Submitting an INI with a single [decky1] was silently redeploying the
deckies from the *previous* deploy too. POST /deckies/deploy merged the
new INI into the stored DecnetConfig by name, so a 1-decky INI on top of
a prior 3-decky run still pushed 3 deckies to the worker. Those stale
decky2/decky3 kept their old IPs, collided on the parent NIC, and the
agent failed with 'Address already in use' — the deploy the operator
never asked for.

The INI is the source of truth for which deckies exist this deploy.
Full replace: config.deckies = list(new_decky_configs). Operators who
want to add more deckies should list them all in the INI.

Update the deploy-limit test to reflect the new replace semantics, and
add a regression test asserting prior state is dropped.
2026-04-19 20:24:29 -04:00
6d7567b6bb fix(fleet): reset stale host_uuid on carried-over deckies before dispatch
Deckies merged in from a prior deployment's saved state kept their
original host_uuid — which dispatch_decnet_config then 404'd on if that
host had since been decommissioned or re-enrolled at a different uuid.
Before round-robin assignment, drop any host_uuid that isn't in the live
swarm_hosts set so orphaned entries get reassigned instead of exploding
with 'unknown host_uuid'.
2026-04-19 06:27:34 -04:00
79db999030 feat(fleet): auto-swarm deploy — shard across enrolled workers when master
POST /deckies/deploy now branches on DECNET_MODE + enrolled host presence:
when the caller is a master with at least one reachable swarm host, round-
robin host_uuids are assigned over new deckies and the config is dispatched
via AgentClient. Falls back to local docker-compose otherwise.

Extracts the dispatch loop from api_deploy_swarm into dispatch_decnet_config
so both endpoints share the same shard/dispatch/persist path. Adds
GET /system/deployment-mode for the UI to show 'will shard across N hosts'
vs 'will deploy locally' before the operator clicks deploy.
2026-04-19 06:09:08 -04:00
cb1a1d1270 fix(fleet): defer DecnetConfig build until deckies are expanded
Stateless /api/v1/deckies/deploy previously instantiated DecnetConfig with
deckies=[] so it could merge entries later — but DecnetConfig.deckies is
min_length=1, so Pydantic raised and the global handler mapped it to 422
'Internal data consistency error'. Construct the config after
build_deckies_from_ini returns at least one DeckyConfig.
2026-04-19 06:02:26 -04:00
2dd86fb3bb perf: cache /bounty, /logs/histogram, /deckies; bump /config TTL to 5s
Round-2 follow-up: profile at 500c/u showed _execute still dominating
the uncached read endpoints (/bounty 76%, /logs/histogram 73%,
/deckies 56%). Same router-level TTL pattern as /stats — 5s window,
asyncio.Lock to collapse concurrent calls into one DB hit.

- /bounty: cache default unfiltered page (limit=50, offset=0,
  bounty_type=None, search=None). Filtered requests bypass.
- /logs/histogram: cache default (interval_minutes=15, no filters).
  Filtered / non-default interval requests bypass.
- /deckies: cache full response (endpoint takes no params).
- /config: bump _STATE_TTL from 1.0 to 5.0 — admin writes are rare,
  1s was too short for bursts to coalesce at high concurrency.
2026-04-17 19:30:11 -04:00
89099b903d fix: resolve schemathesis and live test failures
- Add 403 response to all RBAC-gated endpoints (schemathesis UndefinedStatusCode)
- Add 400 response to all endpoints accepting JSON bodies (malformed input)
- Add required 'title' field to schemathesis.toml for schemathesis 4.15+
- Add xdist_group markers to live tests with module-scoped fixtures to
  prevent xdist from distributing them across workers (fixture isolation)
2026-04-16 01:39:04 -04:00
70d8ffc607 feat: complete OTEL tracing across all services with pipeline bridge and docs
Extends tracing to every remaining module: all 23 API route handlers,
correlation engine, sniffer (fingerprint/p0f/syslog), prober (jarm/hassh/tcpfp),
profiler behavioral analysis, logging subsystem, engine, and mutator.

Bridges the ingester→SSE trace gap by persisting trace_id/span_id columns on
the logs table and creating OTEL span links in the SSE endpoint. Adds log-trace
correlation via _TraceContextFilter injecting otel_trace_id into Python LogRecords.

Includes development/docs/TRACING.md with full span reference (76 spans),
pipeline propagation architecture, quick start guide, and troubleshooting.
2026-04-16 00:58:08 -04:00
0ee23b8700 refactor: enforce RBAC decorators on all API endpoints
- Add @require_role() decorators to all GET/POST/PUT endpoints
- Centralize role-based access control per memory: RBAC null-role bug required server-side gating
- Admin (manage_admins), Editor (write ops), Viewer (read ops), Public endpoints
- Removes client-side role checks as per memory: server-side UI gating is mandatory
2026-04-15 12:51:05 -04:00
3d01ca2c2a fix: resolve ruff lint errors (unused import, E402 import order)
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 14s
CI / Dependency audit (pip-audit) (push) Successful in 27s
CI / Test (Standard) (3.11) (push) Successful in 2m7s
CI / Test (Standard) (3.12) (push) Successful in 2m8s
CI / Test (Live) (3.11) (push) Successful in 58s
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
CI / Test (Fuzz) (3.11) (push) Has been cancelled
2026-04-13 07:58:13 -04:00
035499f255 feat: add component-aware RFC 5424 application logging system
- Modify Rfc5424Formatter to read decnet_component from LogRecord
  and use it as RFC 5424 APP-NAME field (falls back to 'decnet')
- Add get_logger(component) factory in decnet/logging/__init__.py
  with _ComponentFilter that injects decnet_component on each record
- Wire all five layers to their component tag:
    cli -> 'cli', engine -> 'engine', api -> 'api' (api.py, ingester,
    routers), mutator -> 'mutator', collector -> 'collector'
- Add structured INFO/DEBUG/WARNING/ERROR log calls throughout each
  layer per the defined vocabulary; DEBUG calls are suppressed unless
  DECNET_DEVELOPER=true
- Add tests/test_logging.py covering factory, filter, formatter
  component-awareness, fallback behaviour, and level gating
2026-04-13 07:39:01 -04:00
f2cc585d72 fix: align tests with model validation and API error reporting 2026-04-13 01:43:52 -04:00
b2e4706a14 Refactor: implemented Repository Factory and Async Mutator Engine. Decoupled storage logic and enforced Dependency Injection across CLI and Web API. Updated documentation.
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 13s
CI / Dependency audit (pip-audit) (push) Successful in 22s
CI / Test (Standard) (3.11) (push) Failing after 54s
CI / Test (Standard) (3.12) (push) Successful in 1m35s
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-12 07:48:17 -04:00
c384a3103a refactor: separate engine, collector, mutator, and fleet into independent subpackages
- decnet/engine/ — container lifecycle (deploy, teardown, status); _kill_api removed
- decnet/collector/ — Docker log streaming (moved from web/collector.py)
- decnet/mutator/ — mutation engine (no longer imports from cli or duplicates deployer code)
- decnet/fleet.py — shared decky-building logic extracted from cli.py

Cross-contamination eliminated:
- web router no longer imports from decnet.cli
- mutator no longer imports from decnet.cli
- cli no longer imports from decnet.web
- _kill_api() moved to cli (process management, not engine concern)
- _compose_with_retry duplicate removed from mutator
2026-04-12 00:26:22 -04:00
25ba3fb56a feat: replace bind-mount log pipeline with Docker log streaming
Services now print RFC 5424 to stdout; Docker captures via json-file driver.
A new host-side collector (decnet.web.collector) streams docker logs from all
running decky service containers and writes RFC 5424 + parsed JSON to the host
log file. The existing ingester continues to tail the .json file unchanged.
rsyslog can consume the .log file independently — no DECNET involvement needed.

Removes: bind-mount volume injection, _LOG_NETWORK bridge, log_target config
field and --log-target CLI flag, TCP syslog forwarding from service templates.
2026-04-10 00:14:14 -04:00
016115a523 fix: clear all addressable technical debt (DEBT-005 through DEBT-025)
Security:
- DEBT-008: remove query-string token auth; header-only Bearer now enforced
- DEBT-013: add regex constraint ^[a-z0-9\-]{1,64}$ on decky_name path param
- DEBT-015: stop leaking raw exception detail to API clients; log server-side
- DEBT-016: validate search (max_length=512) and datetime params with regex

Reliability:
- DEBT-014: wrap SSE event_generator in try/except; yield error frame on failure
- DEBT-017: emit log.warning/error on DB init retry; silent failures now visible

Observability / Docs:
- DEBT-020: add 401/422 response declarations to all route decorators

Infrastructure:
- DEBT-018: add HEALTHCHECK to all 24 template Dockerfiles
- DEBT-019: add USER decnet + setcap cap_net_bind_service to all 24 Dockerfiles
- DEBT-024: bump Redis template version 7.0.12 → 7.2.7

Config:
- DEBT-012: validate DECNET_API_PORT and DECNET_WEB_PORT range (1-65535)

Code quality:
- DEBT-010: delete 22 duplicate decnet_logging.py copies; deployer injects canonical
- DEBT-022: closed as false positive (print only in module docstring)
- DEBT-009: closed as false positive (templates already use structured syslog_line)

Build:
- DEBT-025: generate requirements.lock via pip freeze

Testing:
- DEBT-005/006/007: comprehensive test suite added across tests/api/
- conftest: in-memory SQLite + StaticPool + monkeypatched session_factory
- fuzz mark added; default run excludes fuzz; -n logical parallelism

DEBT.md updated: 23/25 items closed; DEBT-011 (Alembic) and DEBT-023 (digest pinning) remain
2026-04-09 19:02:51 -04:00
de84cc664f refactor: migrate database to SQLModel and implement modular DB structure 2026-04-09 16:43:30 -04:00
29a2cf2738 refactor: modularize API routes into separate files and clean up dependencies 2026-04-09 11:58:57 -04:00