Commit Graph

35 Commits

Author SHA1 Message Date
ee24a7551f fix(types): T7 — eliminate all remaining 38 mypy errors; fix DeckyRow subscript in engine tests 2026-05-01 02:07:53 -04:00
e387acf79d fix(types): T4 — stop spreading TopologySummary as dict; fix heartbeat .get() and scope param 2026-05-01 01:51:43 -04:00
d637ff515e fix(types): T3 — narrow str|None at 12 sites; fix LANRow/DeckyRow subscript in mutator tests 2026-05-01 01:47:04 -04:00
f597ab2810 fix(types): T1 — remove 15 stale type: ignore comments confirmed unused by mypy 2026-05-01 01:26:24 -04:00
fc1f0914b7 refactor(topology): introduce TopologyRepository protocol with DTO return types
Replace repo: BaseRepository with a structural TopologyRepository protocol
in persistence.py and allocator.py. All read methods now return typed DTOs
(TopologySummary, LANRow, DeckyRow, EdgeRow) instead of raw dicts, eliminating
silent field-shape regressions across the topology subsystem.

TopologySummary gains email_personas and language_default so api_personas.py
can continue reading those fields via attribute access. hydrate() converts
DTOs to dicts before passing to _backfill_decky_configs, keeping the mutable
working-state function dict-based at its boundary. All production callers
(router handlers, mutator, CLI, heartbeat) migrated from dict/get access to
attribute access. 134 tests pass.
2026-04-30 23:51:41 -04:00
257857338c fix(api): replace threading.Lock with asyncio.Lock for hydration guard
await inside a threading.Lock yields to the event loop while the OS
thread still holds the lock — potential deadlock under FastAPI thread
pool dispatch. asyncio.Lock is the correct primitive for async
critical sections. Also fixed stale diurnal.py docstring that had the
delegation direction backwards.
2026-04-30 21:24:11 -04:00
a8c69155ff fix(planner): surface dropped weight entries in PUT /realism/config response
_parse_weights was silently dropping content_class values that don't
belong on their target list with no operator feedback. Changed it to
return (weights, dropped), apply_payload to collect and return all
dropped names, and put_config to include dropped_entries in the
response when non-empty.
2026-04-30 21:18:41 -04:00
ebe15310ab fix(api): hydrate planner from DB exactly once on first GET, not on every read
get_config was calling planner.apply_payload on every GET request, racing
concurrent reads on module-level globals. Added a _hydrated flag + lock
so DB hydration runs at most once per process lifetime; put_config marks
it done too. Test fixture resets the flag between tests.
2026-04-30 21:17:03 -04:00
f6422f2529 fix(heartbeat): replace remaining bare except Exception with SQLAlchemyError and typed builtins 2026-04-30 21:08:26 -04:00
542d129d6f refactor(services_live): replace string-sniffed error dispatch with typed exception subclasses
ServiceNotFoundError (→ 404) and ServiceConflictError (→ 409) replace the
"not found" / "already on" / "not on" substring checks in _map_mutation_error;
base ServiceMutationError still maps to 422. Fixes three pre-existing test
status-code assertions (201 vs 200 on POST endpoints).
2026-04-30 20:49:29 -04:00
a5487eb55f refactor(enroll-bundle): extract bundle_builder and move DTOs to swarm models
Pure tarball construction (_build_tarball, _render_*, _iter_included,
_SYSTEMD_UNITS) moved to decnet/swarm/bundle_builder.py — no FastAPI
dependency, independently testable. EnrollBundleRequest/Response moved
to decnet/web/db/models/swarm.py alongside the other swarm DTOs.
Router drops from 504 to 260 lines; keeps only the in-memory token
registry, sweeper, and endpoints.
2026-04-30 20:39:42 -04:00
e124f9e296 refactor(swarm): extract _shard_payload helper and promote _dispatch to module-level 2026-04-30 20:25:38 -04:00
c648d8b04e fix(heartbeat): replace bare except Exception with specific types and intent comments 2026-04-30 20:19:52 -04:00
bbb1762250 fix(export): one attacker per line in exported JSON 2026-04-30 10:45:03 -04:00
2ddba04f79 feat(attackers): add JSON export endpoint and download button 2026-04-30 10:43:46 -04:00
917f7e8e54 feat(tarpit): MazeNET topology-scoped tarpit — Inspector controls + topology API 2026-04-29 21:10:02 -04:00
5f4005c47a feat(tarpit): port-selective tc netem tarpit mode with live log events
- GET/POST/DELETE /api/v1/deckies/{name}/tarpit (admin write, viewer GET)
- get_container_veth() + get_container_pid() in network.py via iflink/ip-link
- TarpitRule SQLModel table + TarpitMixin repo (upsert/get/delete/list)
- Background tarpit_watcher_worker: polls /proc/{pid}/net/tcp every 15s,
  emits tarpit_enter/tarpit_exit log events (edge-triggered, with duration)
- tarpit_enabled/tarpit_disabled logs on operator POST/DELETE actions
2026-04-29 18:49:42 -04:00
94b06ee862 feat(services): initial config on ADD SERVICE — schema modal in DeckyCard, MazeNET drag, and Inspector
- DeckyServiceAddRequest gains an optional `config: dict` field, validated
  against the service's config_schema before any state mutation (400 on
  bad type, no half-written rows).
- Engine: add_service threads `config` into _add_topology_service /
  _add_fleet_service, persisting validated cfg to decky_config.service_config
  BEFORE compose regen so the first `up -d --build` materialises the env on
  the new container. No follow-up apply needed.
- Frontend: shared AddServiceConfigModal — same wizard accordion shape, used by:
    * DeckyCard's ADD SERVICE picker (Fleet & MazeNET inspectors via shared component)
    * MazeNET Inspector's ADD SERVICE picker
    * MazeNET palette drag-drop onto a deployed decky
  Empty-schema services short-circuit to a one-click add (no modal flash).
  Operator can cancel; errors surface in the modal.
- Tests: add_service config plumbing — persist, drop unknown keys, 400-equivalent
  on bad types, back-compat empty-config.
- Drive-by: fix stale repo-method names in test_services_live.py
  (create_topology_decky → add_topology_decky, get_topology_decky → list+pick helper,
  service.added → service_added topic).
2026-04-29 12:44:47 -04:00
75b1ce3a31 feat(api): per-service config schema endpoint + PUT/POST update+apply for fleet & topology
- GET /topologies/services/{name}/schema serves the declared ServiceConfigField
  metadata so the Inspector can auto-render forms.
- PUT  /(topologies/{id}/)deckies/{decky}/services/{svc}/config persists the
  validated dict (DB + compose); container untouched (Save).
- POST /(topologies/{id}/)deckies/{decky}/services/{svc}/apply persists then
  force-recreates <decky>-<svc> so the new env takes effect (Apply, destructive).
- New engine helper update_service_config wires both fleet and topology paths
  through the existing _persist_fleet_change / _rerender_topology_compose
  machinery; emits decky.<name>.service_config_changed on the bus.
2026-04-29 11:38:06 -04:00
bbed52a962 fix(bus): topic segments can't contain dots — service.added → service_added
Bus topic segments are NATS-style tokens and the validator at
bus/topics.py:402 rejects '.', '*', '>', whitespace.  My W3 constants
'service.added' / 'service.removed' tripped this on every live
add/remove call:

  ValueError: topic segment 'service.added' may not contain '.', ...

Renamed both to underscore form: DECKY_SERVICE_ADDED = 'service_added'.
Aligned the SSE forwarder's name mapping (decky.<name>.service_added →
SSE event 'decky.service_added') and the frontend's
useTopologyStream listener + MazeNET.tsx event handler.  Also updated
the wiki entry with a note about the underscore.
2026-04-28 23:53:25 -04:00
0e5484648f feat: forward decky.*.service.* on per-topology SSE stream
The /topologies/{id}/events SSE proxy now subscribes to two bus
patterns concurrently and merges them through a bounded asyncio.Queue:

* topology.{id}.>  — lifecycle (status, mutation.*) — unchanged.
* decky.>          — per-decky events, filtered by payload.topology_id
                     so a fleet decky sharing a name with a topology
                     decky doesn't leak across.

_sse_name_for routes 'decky.<name>.service.added' to the SSE event
name 'decky.service.added' (kept the prefix so the frontend doesn't
collide with topology lifecycle events that share leaf names like
'status').

useTopologyStream surfaces the two new event names; MazeNET.tsx's
onStreamEvent optimistically patches the matching node's services
list so a second tab reflects shape changes without a refetch.
2026-04-28 23:15:38 -04:00
06f208c86e feat: surface fleet_singleton flag on /topologies/services
Adds a fleet_singletons array to ServiceCatalogResponse so per-decky
add UIs can filter out services like LLMNR that run once fleet-wide
(and would 422 server-side at the live add endpoint).

The existing 'services: list[str]' field is unchanged for back-compat
with MazeNET/useMazeApi.ts:257; the new field is additive.

decnet_web/src/hooks/useServiceRegistry.ts wraps the endpoint with a
module-scoped cache (registry only changes on BYOS install / plugin
drop, neither of which happens mid-session) and exposes a precomputed
.perDecky list so consumers don't need to re-derive the diff.
2026-04-28 23:08:29 -04:00
6ac8cac908 feat(deckies): live service add/remove without full redeploy
decnet.engine.services_live exposes add_service / remove_service for
both fleet and topology decky scopes.  The host's _compose() wrapper
already supported per-service targeting (up --no-deps -d <svc>,
stop, rm -f); what was missing was the orchestration around it:

* add: validate against decnet.services.registry (rejects unknown +
  fleet_singleton); persist the new services list; re-render the
  per-scope compose file (so future redeploys reflect the change);
  run docker compose up -d --no-deps --build <decky>-<svc>.
* remove: stop + rm -f the service container; persist; re-render
  compose so a future up -d doesn't bring it back.

Both publish decky.<name>.service.added / .removed on the bus, with
the post-mutation services list.  Topic constants added to
decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md
ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored).

Four new admin endpoints:

* POST/DELETE /api/v1/deckies/{name}/services{,/svc}
* POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc}

ServiceMutationError messages are mapped at the API boundary to 404
(decky/topology missing), 409 (idempotency violation), 422 (unknown
or fleet_singleton service).
2026-04-28 22:51:42 -04:00
0bc4b05c73 feat(deckies): generic file drops on fleet + MazeNET deckies
Extracts the docker-exec-with-base64-stdin pattern out of canary/planter
and orchestrator/drivers/ssh into a shared decnet.decky_io package.
Both consumers now delegate; the canary planter test still proves the
contract end-to-end.

Adds POST/DELETE /api/v1/deckies/files for arbitrary file drops.
Container resolution is shared with the canary path: topology_id absent
means fleet (<name>-ssh), present routes through resolve_decky_container
which picks <name>-ssh when the topology decky exposes ssh, else the
topology base container decnet_t_<id8>_<name>.

Path validation rejects relative paths and '..' traversal at the request
model layer.  Bad base64 → 400; unknown topology → 404; decky not in
topology → 422; docker exec failure → 409.
2026-04-28 22:43:34 -04:00
3fe999d706 feat(canary): allow custom canaries on MazeNET deckies via API
POST /api/v1/canary/tokens grows an optional topology_id field.  When
present, the server hydrates the topology, validates the named decky is
in it, and resolves the docker container via
planter.resolve_topology_container — <name>-ssh if the decky exposes ssh,
else the topology base container.  Absent ⇒ fleet semantics, unchanged.

The token row gets a nullable topology_id column (no migration helper
per pre-v1 policy).  GET /api/v1/canary/tokens accepts ?topology_id= as
a filter.  DELETE re-resolves the container at revoke time so a
redeployed topology is still reachable.

422 when the named decky isn't in the topology; 404 when the topology
itself doesn't exist.
2026-04-28 22:34:45 -04:00
862e4dbb31 merge: testing → main (reconcile 2-week divergence) 2026-04-28 18:36:00 -04:00
f2cc585d72 fix: align tests with model validation and API error reporting 2026-04-13 01:43:52 -04:00
b2e4706a14 Refactor: implemented Repository Factory and Async Mutator Engine. Decoupled storage logic and enforced Dependency Injection across CLI and Web API. Updated documentation.
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 13s
CI / Dependency audit (pip-audit) (push) Successful in 22s
CI / Test (Standard) (3.11) (push) Failing after 54s
CI / Test (Standard) (3.12) (push) Successful in 1m35s
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-12 07:48:17 -04:00
c384a3103a refactor: separate engine, collector, mutator, and fleet into independent subpackages
- decnet/engine/ — container lifecycle (deploy, teardown, status); _kill_api removed
- decnet/collector/ — Docker log streaming (moved from web/collector.py)
- decnet/mutator/ — mutation engine (no longer imports from cli or duplicates deployer code)
- decnet/fleet.py — shared decky-building logic extracted from cli.py

Cross-contamination eliminated:
- web router no longer imports from decnet.cli
- mutator no longer imports from decnet.cli
- cli no longer imports from decnet.web
- _kill_api() moved to cli (process management, not engine concern)
- _compose_with_retry duplicate removed from mutator
2026-04-12 00:26:22 -04:00
25ba3fb56a feat: replace bind-mount log pipeline with Docker log streaming
Services now print RFC 5424 to stdout; Docker captures via json-file driver.
A new host-side collector (decnet.web.collector) streams docker logs from all
running decky service containers and writes RFC 5424 + parsed JSON to the host
log file. The existing ingester continues to tail the .json file unchanged.
rsyslog can consume the .log file independently — no DECNET involvement needed.

Removes: bind-mount volume injection, _LOG_NETWORK bridge, log_target config
field and --log-target CLI flag, TCP syslog forwarding from service templates.
2026-04-10 00:14:14 -04:00
6b8392102e fix: emit stats/histogram snapshot on SSE connect; remove polling api.get('/stats') from Dashboard 2026-04-09 19:23:24 -04:00
d2a569496d fix: add get_stream_user dependency for SSE endpoint; allow query-string token for EventSource 2026-04-09 19:20:38 -04:00
016115a523 fix: clear all addressable technical debt (DEBT-005 through DEBT-025)
Security:
- DEBT-008: remove query-string token auth; header-only Bearer now enforced
- DEBT-013: add regex constraint ^[a-z0-9\-]{1,64}$ on decky_name path param
- DEBT-015: stop leaking raw exception detail to API clients; log server-side
- DEBT-016: validate search (max_length=512) and datetime params with regex

Reliability:
- DEBT-014: wrap SSE event_generator in try/except; yield error frame on failure
- DEBT-017: emit log.warning/error on DB init retry; silent failures now visible

Observability / Docs:
- DEBT-020: add 401/422 response declarations to all route decorators

Infrastructure:
- DEBT-018: add HEALTHCHECK to all 24 template Dockerfiles
- DEBT-019: add USER decnet + setcap cap_net_bind_service to all 24 Dockerfiles
- DEBT-024: bump Redis template version 7.0.12 → 7.2.7

Config:
- DEBT-012: validate DECNET_API_PORT and DECNET_WEB_PORT range (1-65535)

Code quality:
- DEBT-010: delete 22 duplicate decnet_logging.py copies; deployer injects canonical
- DEBT-022: closed as false positive (print only in module docstring)
- DEBT-009: closed as false positive (templates already use structured syslog_line)

Build:
- DEBT-025: generate requirements.lock via pip freeze

Testing:
- DEBT-005/006/007: comprehensive test suite added across tests/api/
- conftest: in-memory SQLite + StaticPool + monkeypatched session_factory
- fuzz mark added; default run excludes fuzz; -n logical parallelism

DEBT.md updated: 23/25 items closed; DEBT-011 (Alembic) and DEBT-023 (digest pinning) remain
2026-04-09 19:02:51 -04:00
de84cc664f refactor: migrate database to SQLModel and implement modular DB structure 2026-04-09 16:43:30 -04:00
29a2cf2738 refactor: modularize API routes into separate files and clean up dependencies 2026-04-09 11:58:57 -04:00