- Add _MixinBase abstract class to _helpers.py: declares _session(),
_deserialize_attacker(), _assert_pending(), _check_and_bump_version(),
and list_running_topology_deckies() so mypy can see cross-mixin contracts
- Add _require(val, msg) helper for narrowing T | None → T
- Inherit _MixinBase in all 26 leaf mixin classes
- Wrap SQLAlchemy column method calls (.is_(), .like(), .notin_(), .in_(),
.contains()) with col() from sqlmodel — fixes attr-defined false positives
caused by pydantic plugin typing class-level fields as Python value types
- Wrap select(Model.field) with select(col(Model.field)) for column projections
- Add pyproject.toml [[tool.mypy.overrides]] to disable arg-type in
sqlmodel_repo.*: pydantic plugin resolves .where(Model.field == v) as
where(bool), a false positive; call-arg still catches real argument errors
- Remove 9 stale # type: ignore comments (logging, helpers, credentials)
- Fix telemetry.py traced() overload no-redef + misc
- Fix logs.py datetime/str operator and nullable PK comparison with col()
- sqlmodel_repo/ now has 0 mypy errors
Replace repo: BaseRepository with a structural TopologyRepository protocol
in persistence.py and allocator.py. All read methods now return typed DTOs
(TopologySummary, LANRow, DeckyRow, EdgeRow) instead of raw dicts, eliminating
silent field-shape regressions across the topology subsystem.
TopologySummary gains email_personas and language_default so api_personas.py
can continue reading those fields via attribute access. hydrate() converts
DTOs to dicts before passing to _backfill_decky_configs, keeping the mutable
working-state function dict-based at its boundary. All production callers
(router handlers, mutator, CLI, heartbeat) migrated from dict/get access to
attribute access. 134 tests pass.
MutationRow.op was str despite _MUTATION_OPS existing; Topology.mode/status,
TopologyDecky.state, TopologyMutation.op/state carried valid values only in
comments; deferred json import had no justification.
- Promote _MUTATION_OPS before table classes so table fields can reference it
- Add sa_column=Column(String) on each Literal-annotated table field to satisfy
SQLModel 0.0.38 column-type inference
- Move import json to module top; remove deferred import inside _decode_json_payload
- MutationRow.op: str -> _MUTATION_OPS
await inside a threading.Lock yields to the event loop while the OS
thread still holds the lock — potential deadlock under FastAPI thread
pool dispatch. asyncio.Lock is the correct primitive for async
critical sections. Also fixed stale diurnal.py docstring that had the
delegation direction backwards.
_parse_weights was silently dropping content_class values that don't
belong on their target list with no operator feedback. Changed it to
return (weights, dropped), apply_payload to collect and return all
dropped names, and put_config to include dropped_entries in the
response when non-empty.
get_config was calling planner.apply_payload on every GET request, racing
concurrent reads on module-level globals. Added a _hydrated flag + lock
so DB hydration runs at most once per process lifetime; put_config marks
it done too. Test fixture resets the flag between tests.
ServiceNotFoundError (→ 404) and ServiceConflictError (→ 409) replace the
"not found" / "already on" / "not on" substring checks in _map_mutation_error;
base ServiceMutationError still maps to 422. Fixes three pre-existing test
status-code assertions (201 vs 200 on POST endpoints).
Pure tarball construction (_build_tarball, _render_*, _iter_included,
_SYSTEMD_UNITS) moved to decnet/swarm/bundle_builder.py — no FastAPI
dependency, independently testable. EnrollBundleRequest/Response moved
to decnet/web/db/models/swarm.py alongside the other swarm DTOs.
Router drops from 504 to 260 lines; keeps only the in-memory token
registry, sweeper, and endpoints.
Attacker probe emails are now forwarded by the master (realism worker)
rather than inside the MACVLAN container, which has no internet gateway.
- New smtp.probe.pending bus topic: ingester publishes when smtp_relay
message_stored fires; worker subscribes and does the actual delivery
- decnet/orchestrator/drivers/smtp_relay.py: pure-sync forward_probe()
reads the .eml from disk and sends via smtplib on a thread executor
- worker.py: _run_smtp_probe_listener + _handle_probe_pending subtask;
limit enforced via count_probe_relays() (DB-backed, restart-safe)
- bounties.py: count_probe_relays() query on probe_relay bounty type
- fleet.py: get_fleet_decky_by_name() to pull service config from DB
- services/smtp_relay.py: upstream_* and probe_limit fields defined in
config_schema but NOT injected into container env (credentials stay
out of docker env vars)
- ingester.py: stripped of smtplib; publishes probe.pending and exits
- tests: assert upstream keys absent from container environment
Adds probe_forwarded to meaningful event kinds and stores it in the
bounty table as bounty_type=probe_relay with forwarded=true/false, so
the dashboard shows whether the upstream actually accepted the test email.
Once a fingerprint canary's HTTP beacon passes all 4 validation layers
and the trigger row lands, the token is immediately set to state=revoked
and canary.<id>.revoked is published on the bus. The slug lookup is
tightened to only return planted tokens, so subsequent requests to the
same URL silently return the transparent GIF without persisting anything
(stealth posture preserved). Plain http/dns canaries with no
fingerprint_nonce are not affected.
Changes:
- sqlmodel_repo/canary.py: add state == "planted" filter to
get_canary_token_by_slug so revoked slugs resolve to None
- worker.py: after record_canary_trigger, if parsed_fp survived all
layers and token has a fingerprint_nonce, call
update_canary_token_state("revoked") + publish CANARY_REVOKED; errors
are best-effort (trigger row already landed)
- test_worker_http.py: assert state=revoked in test_fp_valid_nonce_persists;
new test_fp_deregisters_slug_after_valid_hit (second hit records nothing);
new test_plain_http_canary_not_deregistered (env_file stays planted)
Adds per-mint nonce gating, structural shape validation, mint UUID
consistency checks, and a per-(token, IP) rate limiter to the canary
worker so attackers who extract a canary from a decky filesystem cannot
poison fingerprint forensics by replaying or forging ?d= submissions.
Changes:
base.py
fingerprint_nonce: Optional[str] added to CanaryArtifact so generators
can surface the nonce to the cultivator without coupling the generator
directly to DB code.
obfuscator.py
nonce_for(callback_token, mint_uuid): HMAC-SHA256 keyed on
DECNET_CANARY_FINGERPRINT_SECRET, truncated to 16 hex chars.
FingerprintSecretMissing raised at mint time if env var is unset.
render_fingerprint_js() now accepts nonce= and substitutes MINT_NONCE.
fingerprint_payload.js
New MINT_NONCE placeholder. Appended as &k= on all beacon URLs (bare-open,
single-shot, chunked). Using &k= avoids colliding with &n= (chunk total).
fingerprint_html.py / fingerprint_svg.py
Derive nonce via nonce_for() and pass to render_fingerprint_js(). Set
artifact.fingerprint_nonce so the cultivator can persist it.
cultivator.py
Passes fingerprint_nonce into create_canary_token() when present on the
artifact; NULL for all non-fingerprint generators.
canary.py (model)
fingerprint_nonce: Optional[str] = Field(default=None, max_length=16)
added to CanaryToken. None for non-fingerprint tokens.
worker.py
_extract_fingerprint now returns (meta_dict, parsed_fp) tuple.
_record_hit accepts parsed_fp + raw_nonce and runs 4 layers after
token lookup: nonce match, shape check, mint UUID consistency, rate limit.
Each failure sets _fp_invalid_* flag and drops structured _fp.
Trigger row always lands regardless.
tests/canary/conftest.py
Session-scoped autouse fixture sets DECNET_CANARY_FINGERPRINT_SECRET so
fingerprint generator and worker tests work offline.
tests
5 new worker HTTP tests and 2 new generator tests covering each
validation layer.
- DeckyServiceAddRequest gains an optional `config: dict` field, validated
against the service's config_schema before any state mutation (400 on
bad type, no half-written rows).
- Engine: add_service threads `config` into _add_topology_service /
_add_fleet_service, persisting validated cfg to decky_config.service_config
BEFORE compose regen so the first `up -d --build` materialises the env on
the new container. No follow-up apply needed.
- Frontend: shared AddServiceConfigModal — same wizard accordion shape, used by:
* DeckyCard's ADD SERVICE picker (Fleet & MazeNET inspectors via shared component)
* MazeNET Inspector's ADD SERVICE picker
* MazeNET palette drag-drop onto a deployed decky
Empty-schema services short-circuit to a one-click add (no modal flash).
Operator can cancel; errors surface in the modal.
- Tests: add_service config plumbing — persist, drop unknown keys, 400-equivalent
on bad types, back-compat empty-config.
- Drive-by: fix stale repo-method names in test_services_live.py
(create_topology_decky → add_topology_decky, get_topology_decky → list+pick helper,
service.added → service_added topic).
- GET /topologies/services/{name}/schema serves the declared ServiceConfigField
metadata so the Inspector can auto-render forms.
- PUT /(topologies/{id}/)deckies/{decky}/services/{svc}/config persists the
validated dict (DB + compose); container untouched (Save).
- POST /(topologies/{id}/)deckies/{decky}/services/{svc}/apply persists then
force-recreates <decky>-<svc> so the new env takes effect (Apply, destructive).
- New engine helper update_service_config wires both fleet and topology paths
through the existing _persist_fleet_change / _rerender_topology_compose
machinery; emits decky.<name>.service_config_changed on the bus.
Two related fixes that came out of running the W5 tests locally:
1. tests/__init__.py — empty file, makes 'tests/' a package so pytest
stops inserting it into sys.path. Without it, 'tests/docker/'
(the docker-image test category) shadowed the installed docker SDK
on every engine-touching test in the repo:
module 'docker' has no attribute 'DockerClient'
Pytest's default --import-mode=prepend was the culprit; making
tests/ a package is the cheapest fix and doesn't change
--import-mode for the whole tree.
2. delete_topology_decky / delete_topology_edge / delete_lan grow an
'enforce_pending: bool = True' kwarg. Default preserves the HTTP
CRUD guard (api_decky_crud / api_edge_crud / api_lan_crud get the
409 for free). apply_remove_decky / apply_detach_decky /
apply_remove_lan now pass enforce_pending=False — the mutator
queue is the live-editing surface and has its own active-topology
gating; the repo's pending-only guard was for design-time CRUD
that mustn't bypass it. Without this, apply_remove_decky was
silently broken on active topologies pre-W5; W5's new test
surfaced it on first run.
10/10 new W5 tests pass; 58/58 across mutator + topology suites.
Bus topic segments are NATS-style tokens and the validator at
bus/topics.py:402 rejects '.', '*', '>', whitespace. My W3 constants
'service.added' / 'service.removed' tripped this on every live
add/remove call:
ValueError: topic segment 'service.added' may not contain '.', ...
Renamed both to underscore form: DECKY_SERVICE_ADDED = 'service_added'.
Aligned the SSE forwarder's name mapping (decky.<name>.service_added →
SSE event 'decky.service_added') and the frontend's
useTopologyStream listener + MazeNET.tsx event handler. Also updated
the wiki entry with a note about the underscore.
The /topologies/{id}/events SSE proxy now subscribes to two bus
patterns concurrently and merges them through a bounded asyncio.Queue:
* topology.{id}.> — lifecycle (status, mutation.*) — unchanged.
* decky.> — per-decky events, filtered by payload.topology_id
so a fleet decky sharing a name with a topology
decky doesn't leak across.
_sse_name_for routes 'decky.<name>.service.added' to the SSE event
name 'decky.service.added' (kept the prefix so the frontend doesn't
collide with topology lifecycle events that share leaf names like
'status').
useTopologyStream surfaces the two new event names; MazeNET.tsx's
onStreamEvent optimistically patches the matching node's services
list so a second tab reflects shape changes without a refetch.
Adds a fleet_singletons array to ServiceCatalogResponse so per-decky
add UIs can filter out services like LLMNR that run once fleet-wide
(and would 422 server-side at the live add endpoint).
The existing 'services: list[str]' field is unchanged for back-compat
with MazeNET/useMazeApi.ts:257; the new field is additive.
decnet_web/src/hooks/useServiceRegistry.ts wraps the endpoint with a
module-scoped cache (registry only changes on BYOS install / plugin
drop, neither of which happens mid-session) and exposes a precomputed
.perDecky list so consumers don't need to re-derive the diff.
decnet.engine.services_live exposes add_service / remove_service for
both fleet and topology decky scopes. The host's _compose() wrapper
already supported per-service targeting (up --no-deps -d <svc>,
stop, rm -f); what was missing was the orchestration around it:
* add: validate against decnet.services.registry (rejects unknown +
fleet_singleton); persist the new services list; re-render the
per-scope compose file (so future redeploys reflect the change);
run docker compose up -d --no-deps --build <decky>-<svc>.
* remove: stop + rm -f the service container; persist; re-render
compose so a future up -d doesn't bring it back.
Both publish decky.<name>.service.added / .removed on the bus, with
the post-mutation services list. Topic constants added to
decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md
ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored).
Four new admin endpoints:
* POST/DELETE /api/v1/deckies/{name}/services{,/svc}
* POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc}
ServiceMutationError messages are mapped at the API boundary to 404
(decky/topology missing), 409 (idempotency violation), 422 (unknown
or fleet_singleton service).
Extracts the docker-exec-with-base64-stdin pattern out of canary/planter
and orchestrator/drivers/ssh into a shared decnet.decky_io package.
Both consumers now delegate; the canary planter test still proves the
contract end-to-end.
Adds POST/DELETE /api/v1/deckies/files for arbitrary file drops.
Container resolution is shared with the canary path: topology_id absent
means fleet (<name>-ssh), present routes through resolve_decky_container
which picks <name>-ssh when the topology decky exposes ssh, else the
topology base container decnet_t_<id8>_<name>.
Path validation rejects relative paths and '..' traversal at the request
model layer. Bad base64 → 400; unknown topology → 404; decky not in
topology → 422; docker exec failure → 409.
POST /api/v1/canary/tokens grows an optional topology_id field. When
present, the server hydrates the topology, validates the named decky is
in it, and resolves the docker container via
planter.resolve_topology_container — <name>-ssh if the decky exposes ssh,
else the topology base container. Absent ⇒ fleet semantics, unchanged.
The token row gets a nullable topology_id column (no migration helper
per pre-v1 policy). GET /api/v1/canary/tokens accepts ?topology_id= as
a filter. DELETE re-resolves the container at revoke time so a
redeployed topology is still reachable.
422 when the named decky isn't in the topology; 404 when the topology
itself doesn't exist.
The Bounty Vault page only read from the Bounty table, but
inotifywait-captured file drops (event_type=file_captured) and SMTP
quarantined messages (event_type=message_stored) were only landing in
the Logs table. AttackerDetail's tabs queried logs directly, so they
showed up per-attacker but were invisible on the global Vault page.
Mirror both events into Bounty as bounty_type=artifact with
payload.kind ∈ {file, mail} so the existing dedup
(bounty_type, attacker_ip, payload) collapses repeats by sha256. Add an
ARTIFACTS segment to the Vault filter row, plus dedicated render
branches: file drops show orig_path + size + writer attribution; mail
shows subject + From + attachment count + size, with the Mail icon
distinguishing them from FileText for file drops.
Forward-only — existing logs stay where they are. A backfill pass would
be straightforward (read Log WHERE event_type IN ('file_captured',
'message_stored') and feed each row through _extract_bounty) but is out
of scope here.
_load_service_container_names() reads decnet-state.json and builds the
exact set of expected container names ({decky}-{service}). is_service_container()
and is_service_event() do a direct set lookup — no regex, no label
inspection, no heuristics.
Two bugs caused the log file to never be written:
1. is_service_container() used regex '^decky-\d+-\w' which only matched
the old decky-01-smtp naming style. Actual containers are named
omega-decky-smtp, relay-decky-smtp, etc. Fixed by using Docker Compose
labels instead: com.docker.compose.project=decnet + non-empty
depends_on discriminates service containers from base (sleep infinity)
containers reliably regardless of decky naming convention.
Added is_service_event() for the Docker events path.
2. The collector was only started when --api was used. Added a 'collect'
CLI subcommand (decnet collect --log-file <path>) and wired it into
deploy as an auto-started background process when --api is not in use.
Default log path: /var/log/decnet/decnet.log
Services now print RFC 5424 to stdout; Docker captures via json-file driver.
A new host-side collector (decnet.web.collector) streams docker logs from all
running decky service containers and writes RFC 5424 + parsed JSON to the host
log file. The existing ingester continues to tail the .json file unchanged.
rsyslog can consume the .log file independently — no DECNET involvement needed.
Removes: bind-mount volume injection, _LOG_NETWORK bridge, log_target config
field and --log-target CLI flag, TCP syslog forwarding from service templates.