DECNET

Author	SHA1	Message	Date
anti	67c2e30f89	feat(profiler): publish attacker.scored per profile upsert (DEBT-031 worker 4) The profiler worker threads its bus publisher through _WorkerState so _update_profiles can emit a compact attacker.scored event for every upsert. Payload carries the headline counts (event/service/decky/ bounty/credential) plus is_traversal, so the MazeNET attacker pool can redraw without a round-trip. Bus stays optional: publish_attacker=None when DECNET_BUS_ENABLED=false or get_bus() fails, and hook exceptions are logged without breaking the upsert path.	2026-04-21 16:54:40 -04:00
anti	e51b65d7c3	feat(correlation,profiler): publish attacker.observed on first sighting (DEBT-031 worker 3) CorrelationEngine gains an optional publish_fn hook fired once per unique attacker IP. The profiler worker — sole caller of the engine today — carries the bus physically, builds a thread-safe publisher, and wraps it with the attacker.observed topic before handing it in. Bus stays optional: if get_bus() fails or DECNET_BUS_ENABLED=false, the engine runs publish_fn=None and the worker degrades to DB-only. Hook failures log a warning and never break ingestion.	2026-04-21 16:53:03 -04:00
anti	34d9e37ab0	feat(prober): publish attacker.fingerprinted on the bus (DEBT-031) Each successful JARM / HASSH / TCPfp probe fans out an attacker.fingerprinted event; the probe family goes in event.type so a single subscription covers all three. Payload carries the attacker IP, port, and probe-specific hash — enough for the MazeNET live map to render fingerprint info on observed attackers. Lifts the thread-safe publisher helper out of the sniffer worker into decnet/bus/publish.py so the prober (and every future worker with a to_thread hot path) can reuse it without copy-pasting the run_coroutine_threadsafe dance. Sniffer rewires onto the shared helper in passing. Adds ATTACKER_FINGERPRINTED as a new leaf — distinct from ATTACKER_OBSERVED (correlator's first-sight signal) because an active probe result is additional evidence about an already-observed attacker. Note: the plan's decky.{id}.state realism-probe publish path is deferred — the current prober fingerprints attackers, not decky realism. Will revisit when realism probes exist.	2026-04-21 16:47:55 -04:00
anti	7f497ac552	feat(sniffer): publish decky.{id}.traffic on the bus (DEBT-031) SnifferEngine gains an optional publish_fn hook, invoked after the dedup + syslog write for traffic-summary events only (tls_session, tcp_flow_timing, tcp_syn_fingerprint) — intermediate parser artifacts like tls_client_hello stay off the bus. The sniffer worker wires get_bus() + a thread-safe shim that marshals sync calls from the scapy sniff thread back onto the asyncio loop via run_coroutine_threadsafe. Bus failure at startup degrades cleanly to publish-off mode; publish failures at runtime never escape the sniff thread.	2026-04-21 16:35:50 -04:00
anti	f3eaab5d37	refactor(bus): extract publish_safely + extend topics for DEBT-031 Shared publish_safely helper at decnet/bus/publish.py so the nine workers about to be wired into the bus don't each copy-paste the "never raise back at the caller" contract. Mutator drops its private copy and imports the canonical one. topics.py gains the attacker.* hierarchy (observed, scored, session.started, session.ended) and a system_health(worker) builder for per-worker health heartbeats — both prerequisites for the worker rollout under DEBT-031.	2026-04-21 16:32:30 -04:00
anti	1968f6e741	test(mutator,web): cover bus publishes, bus-wake, and SSE events route - tests/topology/test_mutator.py: reconcile_topologies publishes applying+applied on success, applying+failed+status on failure; and stays safe when bus=None. _wake_on_enqueue sets its asyncio.Event on every matching enqueue event. - tests/api/topology/test_mutations.py: POST /mutations publishes mutation.enqueued after a successful DB write, via a FakeBus injected in place of the app-wide bus singleton. - tests/api/topology/test_events_stream.py: SSE route returns 401 unauthenticated, 404 for unknown topologies, and (driving the async generator directly) emits a snapshot on connect plus forwards a published mutation.applied as an `event: mutation.applied` SSE frame.	2026-04-21 14:39:12 -04:00
anti	fbf289ff63	feat(bus): host-local UNIX-socket pub/sub worker (DEBT-029) Land the `decnet bus` worker and `get_bus()` factory. Transport is a host-local UNIX-domain socket (0660, group=decnet); authz is the file mode. Wire framing is a tiny verb-line + 4-byte-BE length + orjson body. NATS-style wildcard topics (`*`, `>`). At-most-once, fire-and-forget — DB stays the source of truth. `FakeBus` / `NullBus` for tests and the disabled path. Cross-host federation is deferred to a future `--bridge-tcp` mode; DEBT-030 is master-only and unblocked.	2026-04-21 13:49:02 -04:00
anti	d9f3824086	test(topology): cover compose labels and tolerate docker filter kwarg test_compose asserts the new decnet.topology.* labels land on both base deckies (role=base, no service marker) and service fragments (service=true). The stub docker client in test_deploy grew a filters kwarg so it keeps matching the real .networks.list(filters=...) call signature now used by the deployer.	2026-04-21 10:24:15 -04:00
anti	0cdcfe2653	feat(agent/collector): topology-label discovery and master-authoritative supersede Legacy fleet deckies live in decnet-state.json; MazeNET topology containers don't. Tag them at compose-time with decnet.topology.service=true and let the collector match on that label. Spin up the agent's log collector on the first successful /topology/apply (not in the lifespan — that would break the no-docker-on-boot invariant) and tear it down with the app. Land log lines in DECNET_AGENT_LOG_FILE, separate from master-side DECNET_INGEST_LOG_FILE, so a dev box running both roles can't forward its own ingest back to itself. When master pushes a topology that differs from whatever is pinned locally, teardown the predecessor and accept the new one. Refusing with 409 left the agent stranded after partial deploys. record_error now persists the hydrated blob so a later teardown can still walk the LAN list — otherwise a half-failed apply strands containers + bridges with no breadcrumb back to them.	2026-04-21 10:23:10 -04:00
anti	12e18b75db	feat(swarm): expose needs_resync on TopologySummary + upsert record_error Two small observability follow-ups to the phase-1 agent/topology wiring: TopologySummary now carries needs_resync so operators can see the heartbeat's resync flag via the topology list/detail API without dropping into the DB. TopologyStore.record_error becomes an upsert — when a docker/compose failure fires during the first materialise (put() never reached), we still land a marker row so GET /topology/state surfaces the error and the next heartbeat carries an empty applied_version_hash. That empty hash is what master's heartbeat check relies on to flag the topology for resync instead of assuming the apply succeeded.	2026-04-21 01:41:30 -04:00
anti	0a14dbc9f4	test(agent): pin no-auto-restore-on-boot invariant for topology cache Four regression tests guarding Step 8 of the agent/topology wiring: - Lifespan startup must not call docker.from_env even with a populated topology.db — replace docker with a boom-stub and assert zero calls. - GET /topology/state returns the cached row verbatim without re-materialising bridges/containers; live observation is read-only. - Static guard: TopologyStore must not grow a restore/replay/reapply method without someone re-reading the module docstring. - Raw sqlite read + a second TopologyStore instance confirm the store is passive — nothing scrubs stale rows on open, which is the behaviour master's resync flow depends on.	2026-04-21 01:37:05 -04:00
anti	e8f9c955b3	feat(swarm): heartbeat-driven topology resync for agent-pinned deployments Agent heartbeats now carry an applied-topology snapshot. The master heartbeat handler compares the reported version_hash against what canonical_hash yields for the hydrated topology pinned to that host and flags Topology.needs_resync on divergence (or when the agent reports no topology at all while master expects one). The mutator watch loop gains reconcile_agent_resyncs, which re-pushes the current hydrated blob via AgentClient.apply_topology without touching status, then clears the flag on success. Push failures leave the flag set so the next tick retries.	2026-04-21 01:35:12 -04:00
anti	05d1ebbaaa	feat(engine): route agent-pinned topologies via AgentClient deploy_topology and teardown_topology now branch on target_host_uuid. When set: - Hydrate the topology locally (validator runs exactly as before). - Compute canonical_hash; push {hydrated, version_hash} to the pinned agent through AgentClient.apply_topology. - Status machine still moves PENDING -> DEPLOYING -> ACTIVE on 2xx, PENDING -> DEPLOYING -> FAILED on error; master remains the sole owner of the row. Teardown flips to TEARING_DOWN, fires /topology/teardown, then TORN_DOWN — we log a warning on agent error but still settle to TORN_DOWN so operators can delete the row (agent garbage is cleaned on the next re-enroll). Unihost deploys are unchanged — the field defaults to NULL so every existing flow takes the local path. Step 6 of the agent <-> topology integration.	2026-04-21 01:27:59 -04:00
anti	5f8a746d6e	feat(swarm): AgentClient topology apply/teardown/state methods Three new RPCs mirroring the existing deploy/teardown/status pattern: - apply_topology(hydrated, version_hash) — long-timeout (600s) for image pulls + compose up. - teardown_topology(topology_id) — 300s timeout; enough for a stubborn compose-down without hanging a heartbeat. - get_topology_state() — short control-plane read for reconcile. The per-call timeout swap uses the same trick as .deploy(). Step 5 of the agent <-> topology integration.	2026-04-21 01:26:21 -04:00
anti	13cb0ff38e	feat(agent): topology apply/teardown/state endpoints New mTLS-protected routes on the agent: - POST /topology/apply — master pushes {hydrated, version_hash}. Validates the hash matches locally (serialisation drift guard), runs the topology through the same validator/composer pipeline used master-side, then creates bridges + compose up + records the apply in topology.db. - POST /topology/teardown — dismantles compose, removes bridges, clears topology.db. Idempotent. - GET /topology/state — returns applied row + live docker observation for the heartbeat. Implementation lives in decnet/agent/topology_ops.py; it reuses the private compose helpers from decnet.engine.deployer so we don't duplicate compose/project-name plumbing. The apply path is sync under the hood (docker SDK + subprocess); we hop to a thread so the event loop keeps servicing other agent traffic. v1 is one-topology-per-agent; cross-topology apply returns 409. Step 4 of the agent <-> topology integration.	2026-04-21 01:25:15 -04:00
anti	aea3e7e05b	feat(agent): sqlite-backed topology_store as applied-state cache Single-row sqlite tracking which topology the agent last applied and its version hash. Sync/stdlib, same pattern as the log-forwarder offset store. v1 is one-topology-per-agent; attempting to apply a different topology over a populated row raises AlreadyApplied so the endpoint can return 409. observed() snapshots live docker state (decnet-topology-* bridges + decnet-* containers) for the heartbeat. The store is a cache, not authority — no auto-restore on boot. Master remains the only source of truth. Step 3 of the agent <-> topology integration.	2026-04-21 01:22:01 -04:00
anti	98465af226	feat(topology): canonical_hash for applied-state comparison Tiny pure helper both master and agent will use to answer "is the applied state the one we expect?". SHA-256 of canonical JSON with volatile keys (timestamps, status, version, canvas x/y/w/h) stripped so the hash only captures deployment-relevant state. Step 2 of the agent <-> topology integration.	2026-04-21 01:20:42 -04:00
anti	5a0cf5d7c8	feat(topology): add target_host_uuid to pin topologies to swarm agents Adds the `target_host_uuid` FK on `Topology` plus wiring through the two create endpoints (`POST /topologies`, `POST /topologies/blank`). Validates the mode/host pair: `mode='agent'` now requires a known, routable host; `mode='unihost'` must leave the field unset. Surfaced on `TopologySummary` so list/detail responses expose it. Purely additive at the schema level — existing unihost flows unchanged (field defaults to `NULL`). Step 1 of the agent <-> topology integration.	2026-04-21 01:19:45 -04:00
anti	d06b04221f	feat(api/topology): live mutation queue endpoints (POST/GET /mutations)	2026-04-20 19:38:55 -04:00
anti	ff0b2efbb0	feat(api/topology): pending-only child CRUD for LANs, deckies, edges	2026-04-20 19:37:16 -04:00
anti	999113e3c3	feat(api/topology): POST/DELETE/deploy endpoints for MazeNET topologies	2026-04-20 19:34:35 -04:00
anti	f182c98ffa	feat(api): phase 3 step 2 — topology read endpoints (list/get/status/catalog) GET /api/v1/topologies — paginated list with status filter. Extends repo.list_topologies() to accept limit/offset and adds count_topologies() for the total envelope field. GET /api/v1/topologies/{id} — hydrated TopologyDetail; 404 if missing. GET /api/v1/topologies/{id}/status-events — audit trail, limit-capped. Catalog helpers for the phase-4 canvas UI: * GET /topologies/services — full service catalog. * GET /topologies/next-subnet?base=172.20 — wraps SubnetAllocator against reserved_subnets across non-torn-down topologies. * GET /topologies/{id}/lans/{lan_id}/next-ip — IPAllocator pre-seeded with existing decky IPs in that LAN. All read routes are viewer-or-admin. Sub-routers are included in an order that keeps literal catalog paths (/services, /next-subnet) from being shadowed by the /{topology_id} trie branch.	2026-04-20 18:25:33 -04:00
anti	2379b2aeda	feat(api): phase 3 step 1 — topology request/response models + router skeleton Add Pydantic DTOs in decnet/web/db/models.py covering every phase-3 endpoint shape: TopologyGenerateRequest, TopologySummary/Detail, child create/update requests, MutationEnqueueRequest (Literal op guard), MutationRow with JSON-payload decoder, validation/version/not-editable error envelopes, and the three catalog responses. Create decnet/web/router/topology/ as an import-safe package exporting topology_router (prefix /topologies) — sub-routers land step-by-step in subsequent commits. Mount under the main api router alongside swarm_mgmt. tests/api/topology/test_models.py pins repo-dict ↔ DTO parity so future repo-row drift breaks the contract test before the endpoints.	2026-04-20 18:16:30 -04:00
anti	a76b9ecdf9	feat(mazenet): step 7 — topology_mutations queue + mutator reconciler Adds the live-mutation pipeline for active/degraded topologies: * TopologyMutation table with composite index (state, topology_id) so the watch-loop guard query stays O(log n). * claim_next_mutation is a single atomic UPDATE ... WHERE state='pending' so racing reconcilers deterministically pick one winner; losers see rowcount=0 and skip. * reconcile_topologies drains pending rows per live topology, applies via decnet.mutator.ops.dispatch, and on failure marks the mutation failed + transitions topology to degraded. * run_watch_loop gains a gated branch: flat-fleet mutate_all runs every tick unchanged; the reconciler only enters when the cheap has_pending_topology_mutation guard returns True. * apply_* ops re-check hard invariants (names, IP collisions, subnet overlap, known services, service_config shape) after every mutation so the repo never lands in an invalid state. * CLI: 'decnet topology mutate' / 'mutations' subcommands.	2026-04-20 18:02:37 -04:00
anti	91df57d36b	feat(topology): pending-only mutation repo methods with cascade + guards MazeNET phase 2 step 6. Equips the repo layer with the CRUD the web editor needs before deploy. - TopologyNotEditable exception: raised when a pending-only method hits a non-pending topology. The intent is "free-form edits stop at deploy; the mutator (step 7) takes over for live topologies." - _assert_pending helper checks status inside the session. - update_lan / update_topology_decky accept enforce_pending=True for pre-deploy callers (existing internal callers default to False so behavior is unchanged). - delete_lan: cascades edges; refuses if any decky has only one edge (= this LAN is its home) to prevent orphans. - delete_topology_decky: cascades edges. - delete_topology_edge: bare-bones removal. All four mutators accept expected_version for optimistic concurrency. Existing tests continue to pass (no behavior change for persist/deploy).	2026-04-20 17:50:29 -04:00
anti	9afaac7612	feat(topology): nullable layout coords on LAN + TopologyDecky MazeNET phase 2 step 5. Pure storage — the generator emits None for x/y and the web canvas fills them in later. No logic changes; no compose, deploy, or validator impact.	2026-04-20 17:48:29 -04:00
anti	e475c0957e	feat(topology): optimistic concurrency via Topology.version + expected_version MazeNET phase 2 step 4. Readies the repo layer for concurrent editors (web canvas + CLI + mutator) without lost-write races. - Topology.version: monotonically bumped on supervised child-row writes. - VersionConflict exception carries {current, expected} for the UI. - _check_and_bump_version helper reads Topology in the same session, compares against expected_version, raises on mismatch, bumps on match. Commit happens in the caller's existing transaction so check+bump+write are atomic per mutation. - add_lan / update_lan / add_topology_decky / update_topology_decky / add_topology_edge accept expected_version=None by default, preserving every existing caller's behavior. When expected_version is None, no check runs and version stays put — internal callers (persist) that don't care about concurrency keep working unchanged.	2026-04-20 17:47:28 -04:00
anti	2544d0294a	feat(topology): add pre-deploy validator and wire into deploy_topology MazeNET phase 2 step 3. Blocks deploys of hand-authored topologies that would fail mid-bring-up (orphan deckies, duplicate IPs, overlapping subnets, unknown services) with a structured error list instead of a docker error at startup. Rules (one function each, composable by the editor for inline hints): - exactly one DMZ - every LAN has a bridge chain to the DMZ (BFS via multi-homed deckies) - no orphan deckies - unique LAN and decky names per topology - no IP collisions + IPs inside their LAN's subnet - no LAN subnet overlaps - every service in decnet.fleet.all_service_names() - service_config keys match the decky's declared services deploy_topology runs the validator after hydrate, before any status transition or Docker call; errors raise ValidationError and status stays at pending.	2026-04-20 17:45:32 -04:00
anti	d4f4c58277	feat(topology): thread per-service config overrides through compose MazeNET phase 2 step 2. Mirrors the flat-fleet service_config pattern (DeckyConfig.service_config → composer → svc.compose_fragment) into the topology compose pipeline, so a hand-authored decky can carry overrides like {"ssh": {"password": "megapassword"}} and the ssh fragment reads them just like the flat path does. - _PlannedDecky gains service_config: dict[str, dict]. - persist() stores it under decky_config["service_config"]. - topology/compose.py passes cfg.get("service_config", {}).get(svc, {}) to svc.compose_fragment(service_cfg=...). Schema unchanged — service_config lives inside the existing decky_config JSON blob. Zero changes in decnet/services/*.	2026-04-20 17:42:37 -04:00
anti	1bd1846e40	feat(topology): extract IP + subnet allocators as reusable services MazeNET phase 2 step 1. Pulls inline IP/subnet allocation out of the generator into decnet/topology/allocator.py so the editor + reconciler can reuse the same primitives without duplicating logic. - IPAllocator: stateful host-IP handout with reserve/release/is_free. - SubnetAllocator: /24 handout under a base prefix, skips reservations. - reserved_subnets(repo): collects claimed subnets across every non-torn_down topology so concurrent drafts cannot collide. - generate() accepts reserved_subnets= to skip existing claims. Generator output is byte-identical under seed (behavior preserved).	2026-04-20 17:41:17 -04:00
anti	80e3c28234	test(topology): deploy dry-run + failure-path + live docker e2e Covers dry-run compose emission (no status change), FAILED transition with reason logged on daemon errors, teardown from FAILED, and a live-marked end-to-end test that creates/removes bridge networks against a real docker daemon (skipped on CI).	2026-04-20 16:57:43 -04:00
anti	2a030bf3a9	feat(topology): add compose generator and deployer integration Adds per-topology compose generation (one Docker bridge network per LAN, multi-homed bridge deckies, ip_forward sysctl for L3 forwarders) plus async deploy_topology/teardown_topology in the engine. Leaf-first teardown via BFS-named LAN reverse sort; partial-state safe on failure.	2026-04-20 16:54:40 -04:00
anti	33f139ecfa	feat(mazenet): topology package — config, status machine, generator, persistence Adds decnet/topology/ with: - config.TopologyConfig: pydantic model driving generation (depth, branching_factor, deckies_per_lan_min/max, bridge_forward_probability, cross_edge_probability, subnet_base_prefix, service selection, seed). Emits GeneratedTopology dataclass (lans, deckies, edges). - status.TopologyStatus + assert_transition: seven-state machine with an explicit legal-transition table. torn_down is terminal; degraded is schema-reserved for future Healer use. - generator.generate: deterministic DAG generation under config.seed. Builds a tree of LANs (DMZ at root), plants deckies in each LAN, promotes one decky per non-DMZ LAN to a parent bridge, and rolls cross-edges per cross_edge_probability for DAG shape. - persistence: persist() writes a plan to the repo as pending; transition_status() enforces state-machine legality; hydrate() loads topology + children into a single dict. Covered by tests/topology/{test_status,test_generator,test_persistence}.	2026-04-20 16:48:20 -04:00
anti	47cd200e1d	feat(mazenet): repo methods for topology/LAN/decky/edge/status events Adds topology CRUD to BaseRepository (NotImplementedError defaults) and implements them in SQLModelRepository: create/get/list/delete topologies, add/update/list LANs and TopologyDeckies, add/list edges, plus an atomic update_topology_status that appends a TopologyStatusEvent in the same transaction. Cascade delete sweeps children before the topology row. Covered by tests/topology/test_repo.py (roundtrip, per-topology name uniqueness, status event log, cascade delete, status filter) and an extension to tests/test_base_repo.py for the NotImplementedError surface.	2026-04-20 16:43:49 -04:00
anti	4197441c01	fix(ci): skip live service isolation Some checks failed CI / Lint (ruff) (push) Successful in 12s Details CI / SAST (bandit) (push) Successful in 15s Details CI / Dependency audit (pip-audit) (push) Successful in 22s Details CI / Test (Standard) (3.11) (push) Successful in 2m47s Details CI / Test (Live) (3.11) (push) Successful in 1m7s Details CI / Test (Fuzz) (3.11) (push) Failing after 45m40s Details CI / Merge dev → testing (push) Has been skipped Details CI / Prepare Merge to Main (push) Has been skipped Details CI / Finalize Merge to Main (push) Has been skipped Details	2026-04-20 13:14:48 -04:00
anti	1b70d6db87	fix(ci): added skipif on mysql absence Some checks failed CI / Lint (ruff) (push) Successful in 12s Details CI / SAST (bandit) (push) Successful in 15s Details CI / Dependency audit (pip-audit) (push) Successful in 24s Details CI / Test (Standard) (3.11) (push) Successful in 2m51s Details CI / Test (Live) (3.11) (push) Failing after 1m2s Details CI / Test (Fuzz) (3.11) (push) Has been skipped Details CI / Merge dev → testing (push) Has been skipped Details CI / Prepare Merge to Main (push) Has been skipped Details CI / Finalize Merge to Main (push) Has been skipped Details	2026-04-20 13:07:31 -04:00
anti	f064690452	fixed(tests): jwt_lazy Some checks failed CI / Lint (ruff) (push) Successful in 12s Details CI / SAST (bandit) (push) Successful in 15s Details CI / Dependency audit (pip-audit) (push) Successful in 23s Details CI / Test (Standard) (3.11) (push) Successful in 5m6s Details CI / Test (Standard) (3.12) (push) Failing after 3h14m38s Details CI / Test (Live) (3.11) (push) Has been cancelled Details CI / Test (Fuzz) (3.11) (push) Has been cancelled Details CI / Merge dev → testing (push) Has been cancelled Details CI / Prepare Merge to Main (push) Has been cancelled Details CI / Finalize Merge to Main (push) Has been cancelled Details	2026-04-20 02:26:54 -04:00
anti	dd82cd3f39	fixed(tests): mode_gating Some checks failed CI / Lint (ruff) (push) Successful in 12s Details CI / SAST (bandit) (push) Successful in 15s Details CI / Dependency audit (pip-audit) (push) Successful in 23s Details CI / Test (Standard) (3.11) (push) Failing after 5m0s Details CI / Test (Live) (3.11) (push) Has been cancelled Details CI / Test (Fuzz) (3.11) (push) Has been cancelled Details CI / Merge dev → testing (push) Has been cancelled Details CI / Prepare Merge to Main (push) Has been cancelled Details CI / Finalize Merge to Main (push) Has been cancelled Details CI / Test (Standard) (3.12) (push) Has been cancelled Details	2026-04-20 02:18:11 -04:00
anti	47f2ca8d5f	added(tests): schemathesis contract fuzzing at the agent and swarmctl level Some checks failed CI / Lint (ruff) (push) Successful in 17s Details CI / SAST (bandit) (push) Failing after 19s Details CI / Dependency audit (pip-audit) (push) Failing after 38s Details CI / Test (Standard) (3.11) (push) Has been skipped Details CI / Test (Standard) (3.12) (push) Has been skipped Details CI / Test (Live) (3.11) (push) Has been skipped Details CI / Test (Fuzz) (3.11) (push) Has been skipped Details CI / Merge dev → testing (push) Has been skipped Details CI / Prepare Merge to Main (push) Has been skipped Details CI / Finalize Merge to Main (push) Has been skipped Details	2026-04-20 01:27:39 -04:00
anti	da3e675f86	fix(tests): fixed locust fixtures and rampups, since >100 generally isn't very well managed	2026-04-20 01:26:56 -04:00
anti	4abfac1a98	fix: monkeypatch test db URL	2026-04-20 00:03:12 -04:00
anti	9eca33938d	chore: deleted swp file	2026-04-19 23:51:59 -04:00
anti	195580c74d	test: fix templates paths, CLI gating, and stress-suite harness - tests/*: update templates/ → decnet/templates/ paths after module move - tests/mysql_spinup.sh: use root:root and asyncmy driver - tests/test_auto_spawn.py: patch decnet.cli.utils._pid_dir (package split) - tests/test_cli.py: set DECNET_MODE=master in api-command tests - tests/stress/conftest.py: run locust out-of-process via its CLI + CSV stats shim to avoid urllib3 RecursionError from late gevent monkey-patch; raise uvicorn startup timeout to 60s, accept 401 from auth-gated health, strip inherited DECNET_ env, surface stderr on 0-request runs - tests/stress/test_stress.py: loosen baseline thresholds to match hw	2026-04-19 23:50:53 -04:00
anti	262a84ca53	refactor(cli): split decnet/cli.py monolith into decnet/cli/ package The 1,878-line cli.py held every Typer command plus process/HTTP helpers and mode-gating logic. Split into one module per command using a register(app) pattern so submodules never import app at module scope, eliminating circular-import risk. - utils.py: process helpers, _http_request, _kill_all_services, console, log - gating.py: MASTER_ONLY_* sets, _require_master_mode, _gate_commands_by_mode - deploy.py: deploy + _deploy_swarm (tightly coupled) - lifecycle.py: status, teardown, redeploy - workers.py: probe, collect, mutate, correlate - inventory.py, swarm.py, db.py, and one file per remaining command __init__.py calls register(app) on each module then runs the mode gate last, and re-exports the private symbols tests patch against (_db_reset_mysql_async, _kill_all_services, _require_master_mode, etc.). Test patches retargeted to the submodule where each name now resolves. Enroll-bundle tarball test updated to assert decnet/cli/__init__.py. No behavioral change.	2026-04-19 22:42:52 -04:00
anti	d1b7e94325	fix(swarm): inject peer cert into ASGI scope for uvicorn <= 0.44 Uvicorn's h11/httptools HTTP protocols don't populate scope['extensions']['tls'], so /swarm/heartbeat's per-request cert pinning was 403ing every call despite CERT_REQUIRED validating the cert at handshake. Patch RequestResponseCycle.__init__ on both protocol modules to read the peer cert off the asyncio transport and write DER bytes into scope['extensions']['tls']['client_cert_chain']. Importing the module from swarm_api.py auto-installs the patch in the swarmctl uvicorn worker before any request is served.	2026-04-19 22:09:11 -04:00
anti	bf01804736	feat(agent): periodic heartbeat loop posting status to swarmctl New decnet.agent.heartbeat asyncio loop wired into the agent FastAPI lifespan. Every 30 s the worker POSTs executor.status() to the master's /swarm/heartbeat with its DECNET_HOST_UUID for self-identity; the existing agent mTLS bundle provides the client cert the master pins against SwarmHost.client_cert_fingerprint. start() is a silent no-op when identity env (HOST_UUID, MASTER_HOST) is unset or the worker bundle is missing, so dev runs and un-enrolled hosts don't crash the agent app. On non-204 responses the loop logs loudly but keeps ticking — an operator may re-enrol mid-session, and fail-closed pinning shouldn't be self-silencing.	2026-04-19 21:49:34 -04:00
anti	62f7c88b90	feat(swarmctl): --tls with auto-issued or BYOC server cert swarmctl CLI gains --tls/--cert/--key/--client-ca flags. With --tls the controller runs uvicorn under HTTPS + mTLS (CERT_REQUIRED) so worker heartbeats can reach it cross-host. Default is still 127.0.0.1 plaintext for backwards compat with the master-CLI enrollment flow. Auto-issue path (no --cert/--key given): a server cert signed by the existing DECNET CA is issued once and parked under ~/.decnet/swarmctl/. Workers already ship that CA's ca.crt from the enroll bundle, so they verify the endpoint with no extra trust config. BYOC via --cert/--key when the operator wants a publicly-trusted or externally-managed cert. The auto-cert path is idempotent across restarts to keep a stable fingerprint for any long-lived mTLS sessions.	2026-04-19 21:46:32 -04:00
anti	148e51011c	feat(swarm): agent→master heartbeat with per-host cert pinning New POST /swarm/heartbeat on the swarm controller. Workers post every ~30s with the output of executor.status(); the master bumps SwarmHost.last_heartbeat and re-upserts each DeckyShard with a fresh DeckyConfig snapshot and runtime-derived state (running/degraded). Security: CA-signed mTLS alone is not sufficient — a decommissioned worker's still-valid cert could resurrect ghost shards. The endpoint extracts the presented peer cert (primary: scope["extensions"]["tls"], fallback: transport.get_extra_info("ssl_object")) and SHA-256-pins it to the SwarmHost.client_cert_fingerprint stored for the claimed host_uuid. Extraction is factored into _extract_peer_fingerprint so tests can exercise both uvicorn scope shapes and the both-unavailable fail-closed path without mocking uvicorn's TLS pipeline. Adds get_swarm_host_by_fingerprint to the repo interface (SQLModel impl reuses the indexed client_cert_fingerprint column).	2026-04-19 21:37:15 -04:00
anti	f576564f02	fix(agent): also wipe /etc/decnet during self-destruct	2026-04-19 21:04:31 -04:00
anti	00d5799a79	fix(agent): escape systemd cgroup when spawning self-destruct reaper The reaper was being SIGTERM'd mid-rm because `start_new_session=True` only forks a new POSIX session — it does not escape decnet-agent.service's cgroup. When the reaper ran `systemctl stop decnet-agent`, systemd tore down the whole cgroup (reaper included) before `rm -rf /opt/decnet*` finished, leaving the install on disk. Spawn the reaper via `systemd-run --collect --unit decnet-reaper-<pid>` so it runs in a fresh transient scope, outside the agent unit. Falls back to bare Popen for non-systemd hosts.	2026-04-19 21:00:43 -04:00

1 2 3 4 5

238 Commits