Four-part fix for the collection bottleneck that was blocking the dev loop:
1. Lazy mitreattack.stix20 import in attack_stix.py — deferred to first
_load() call (TYPE_CHECKING guard at top level)
2. Lazy misp_stix_converter import in both MISP export routers — moved
from module level into the route handler body
3. Lazy attack_catalog / attack_stix in ttp.py repo mixin — thin wrapper
functions so the import chain never fires at module load time
4. tests/api/conftest.py — `from decnet.web.api import app` moved inside
the `client()` fixture; `pytest_ignore_collect` broadened to skip all
test_schemathesis*.py variants (not just test_schemathesis.py), which
were launching a subprocess server at module-import time
5. pyproject.toml — `norecursedirs` for tests/live, tests/stress,
tests/service_testing, tests/docker, tests/perf so these directories
are never entered; `-m` filter removed from addopts (now redundant);
`--dist loadscope` → `--dist load` to unblock workers immediately
6. behave_core / behave_shell rename — BEHAVE packages dropped the
`decnet_` prefix; reinstalled editable installs and updated all 14
import sites across profiler, ttp, bus, and correlation modules
Remaining files from the fingerprint-bounties + characterizes-SRO commit:
misp_export, repository, bounties mixin, all 4 router endpoints, and test suite
updates. Prerequisite: previous commit added _extract_fingerprint_bounty_data
and the stix_export changes.
Adds GET /api/v1/attackers/{uuid}/export/misp and
GET /api/v1/attackers/export/misp backed by misp_export.py, which
converts existing STIX bundles to MISP events via misp-stix
ExternalSTIX2toMISPParser. Fleet endpoint emits {response:[...]}
collection (one event per attacker). Frontend: STIX/MISP buttons on
AttackerDetail header and Attackers list. 13 new tests green.
GET /api/v1/attackers/{uuid}/export/stix returns a self-contained STIX
2.1 bundle: ip observation, threat-actor, ATT&CK attack-patterns with
canonical MITRE IDs, uses relationships, per-tag sightings, file SCOs
for artifacts, domain-name SCOs for SMTP targets, and a provider intel
note. Attack-pattern SDOs carry the MITRE bundle IDs so consumers
deduplicating against the public ATT&CK bundle get exact matches.
Surfaces the intrusion-set reverse index from the loaded ATT&CK
bundle: given a technique, returns the list of groups MITRE has
documented as using it. Read-only — explicitly NOT an attribution
claim about a DECNET attacker. The frontend pulls this lazily when
the operator expands a technique panel; payload-size cost on every
TTPTagDetailRow makes embedding wasteful for techniques with 50+
documented groups.
- decnet/web/router/ttp/api_get_groups_for_technique.py exposes
GET /api/v1/ttp/techniques/{technique_id}/groups, response_model
list[GroupRef]. Same JWT-viewer auth gating as the rest of the
TTP router. 404 when the technique_id doesn't resolve in the
bundle.
- Sub-techniques are queried directly (no auto-union with parent)
to match ATT&CK Navigator semantics; callers that want a broader
view query the parent themselves.
- tests/ttp/test_groups_for_technique.py covers happy path, 404,
sub-technique attribution independence, empty-list-on-zero-groups,
and that responses include mitre_url + aliases.
- tests/web/test_api_attackers.py: fix pre-existing fixture drift
introduced by a2a61b63 — three TestGetAttackerDetail cases were
missing AsyncMock for repo.latest_observation_per_primitive,
causing TypeError on await of MagicMock. The new groups endpoint
doesn't share code with attacker_detail; this is a drive-by fix
surfaced by the same suite run.
* decnet attribution — Typer command mirroring decnet reuse-correlate
(--multi-actor-tick, --daemon flags). Calls run_attribution_loop
with the dependency-injected repo.
* deploy/decnet-attribution.service.j2 — systemd unit mirroring
decnet-reuse-correlator.service.j2: ExecStart=decnet attribution,
same hardening posture (NoNewPrivileges, ProtectSystem=full,
ProtectHome=read-only, dedicated /var/log/decnet/decnet.attribution.log).
* worker_registry.KNOWN_WORKERS += "attribution" — heartbeat already
publishes as system.attribution.health from
attribution_worker._WORKER_NAME, so the Workers panel surfaces the
row the moment the unit is enabled.
* api_start_all_workers preferred-order list + "attribution" between
reuse-correlator and enrich so a fresh start-all brings it up
alongside its peers.
After this commit `systemctl enable --now decnet-attribution` (or
the dashboard's start-all) actually launches the engine.
GET /api/v1/attackers/{uuid}/attribution
Returns the merger output for an attacker's identity:
{
"identity_uuid": "abc..." | null,
"primitives": [
{primitive, current_value, state, confidence,
observation_count, last_change_ts, last_observation_ts},
...
]
}
Pre-attribution-worker: identity_uuid=null, primitives=[]. Surfacing
identity_uuid keeps the cross-attacker rollup story visible to the
frontend ahead of v1's clusterer landing.
api_events SSE relay also subscribes to attribution.> and forwards
to the AttackerDetail page filtered on payload.identity_uuid (the
identity is resolved at stream open from the URL's attacker_uuid;
attribution payloads are identity-keyed, not attacker-keyed). New
SSE event names: attribution.state_changed,
attribution.multi_actor_suspected.
Frontend (AttackerDetail.tsx badge rendering, useAttackerStream
consumer) deferred — there's already WIP on AttackerDetail.tsx in
the working tree; merging the badge logic is a separate commit
once that lands.
Tests: 4 endpoint scenarios — 401 unauth, 404 unknown attacker,
200 empty (no stub), 200 with primitive-ordered rows.
GET /api/v1/attackers/{uuid}/events streams behavioural events for
one attacker. Mirrors decnet/web/router/topology/api_events.py
end-to-end: ?token= auth, require_stream_viewer gate,
sse_connection_slot per-user cap, snapshot-on-connect, three bus
subscriptions (attacker.observation.>, attacker.fingerprint_rotated,
attacker.scored) merged through asyncio.Queue, 15s keepalive,
request.is_disconnected() exit, finally task cancellation.
Per-attacker filter keys on payload['attacker_uuid'] which the
profiler worker stamps onto every published payload (Phase 5 P5.0
amendment) — O(1) drop without a repo round-trip per event.
_sse_name_for derives SSE event names:
attacker.observation.<primitive> → observation.<primitive>
attacker.fingerprint_rotated → fingerprint.rotated
attacker.scored → attacker.scored
10 tests cover snapshot, live forward, per-attacker filter (drops
other attackers' events), fingerprint.rotated forward, 404, 401, and
the sse-name derivation across all four cases. Topology events
regression green.
Move `_find_shard_with_sid`, `_resolve_shard`, `_validate_names`,
`_get_index`, and the index cache from
`decnet/web/router/transcripts/api_get_transcript.py` into
`decnet/artifacts/shards.py`. The shared module speaks
`ValueError`; the router keeps thin wrappers that translate to
`HTTPException(400)` so the route's error UX is unchanged.
This unblocks the BEHAVE-INTEGRATION Phase 4 worker wiring — the
profiler worker (and the collector's session aggregator) need to
disk-reach asciinema shards but must not import from a FastAPI
router.
11 new unit tests for the shared helper. Existing transcript router
tests pass (the shard fixture's monkeypatch points at the shared
module's ARTIFACTS_ROOT now).
Destructive half of BEHAVE-INTEGRATION.md Phase 1. SessionProfile +
its kd_* columns + the dialect ALTER TABLE migration helpers are
deleted outright; pre-v1, the table shipped empty, no migration
ceremony required (per the no-new-_migrate_-pre-v1 memory rule).
DEBT-036 closes via DEBT-050 supersedure. AttackerDetail's
``observations`` field is wired to the new ``observations`` table
and returns an empty list until the BEHAVE-SHELL extractor (DEBT-050
Phase 2) starts emitting.
decnet/web/db/models/attackers.py — SessionProfile class deleted
(~135 lines), KD_PAUSE_*/KD_START_OF_ACTION_IDLE_S module constants
deleted, module docstring updated to point at the observations
table. AttackerIdentity.kd_digraph_simhash is KEPT — it's the v2
federation centroid hook, not a SessionProfile field; docstring
repointed to the BEHAVE primitive that will populate it.
decnet/web/db/sqlmodel_repo/attackers/sessions.py — DELETED.
SessionProfilesMixin dropped from the AttackersMixin MRO.
decnet/web/db/repository.py — abstract upsert_session_profile +
get_session_profile removed.
decnet/web/db/sqlite/repository.py + mysql/repository.py —
_migrate_session_profile_table helpers and their initialize() calls
removed. mysql initialize() now goes attackers → column_types →
admin (no session_profile step).
decnet/web/db/models/__init__.py — SessionProfile re-export gone.
decnet/web/db/models/attacker_intel.py — docstring cross-reference
to SessionProfile.schema_version retargeted to AttackerIdentity.
decnet/web/router/attackers/api_get_attacker_detail.py — adds
``observations: []`` to the response by calling
``repo.latest_observation_per_primitive(uuid)`` and projecting to a
list sorted by primitive path. Empty until the extractor lands;
shape matches BEHAVE-INTEGRATION.md §"AttackerDetail consumer".
tests/profiler/test_session_profile.py — DELETED (56 lines).
tests/db/test_base_repo.py — DummyRepo loses upsert_session_profile
and get_session_profile overrides.
tests/db/mysql/test_mysql_migration.py — initialize-call-order
assertion updated; session_profile step removed from the expected
sequence; docstring records why.
tests/ttp/test_lifter_absence.py — docstring "no SessionProfile" →
"no ObservationRow".
New GET /api/v1/orchestrator/events/stats?since=1h&success=false&kind=...
backed by repo.count_orchestrator_failures(since_ts, kind), which
counts failed rows across both orchestrator_events and
orchestrator_emails since the cutoff.
Window parser accepts ^\d+[smhd]$, capped at 7d. Today only
success=false is accepted on this surface so the endpoint isn't
accidentally repurposed before the next consumer is properly
designed.
Orchestrator.tsx polls the endpoint on mount + every 30 s and
renders the authoritative DB-derived count instead of deriving from
the in-memory SSE buffer + one paginated page (which silently
excluded failures older than the local window).
Move artifact path validation + symlink-escape check out of the
admin-gated download endpoint into decnet/artifacts/paths.py so the
TTP EmailLifter can disk-reach .eml files at tag-time without
duplicating regex/root logic (DEBT-047).
The router now catches ArtifactPathError and re-raises HTTPException(400);
behavior is unchanged.
"T1595" alone is opaque; "T1595 — Active Scanning" tells you the
story at a glance. The names come from a backend-side static catalogue
pinned to the same ATT&CK release as the rule engine
(_ATTACK_RELEASE = "v15.1") — names are the canonical MITRE labels,
not author-supplied strings on rules, so a rule author can't typo a
name and the entire fleet sees the typo.
- New `decnet/ttp/attack_catalog.py` with `TECHNIQUE_NAMES` covering
every technique_id + sub_technique_id emitted by `rules/ttp/`
(R0001..R0058 → 69 IDs in the v0 pack).
- `IdentityTechniqueRow` / `TechniqueRollupRow` / `CampaignTechniqueRow`
/ `TTPTagDetailRow` gain optional `technique_name` /
`sub_technique_name` fields. Repo + router populate them from the
catalogue at row-construction time. None when an ID isn't in the
catalogue — UI falls back to the bare ID.
- Coverage test (`tests/ttp/test_attack_catalog.py`) walks every
YAML rule and asserts every emitted ID has a catalogue entry, so
a future rule author who forgets to update the catalogue gets a
loud failure rather than a silent UI fallback.
Frontend:
- `TTPsObservedSection` shows "T1595.002 — Active Scanning:
Vulnerability Scanning" instead of just the ID, with overflow
ellipsis + tooltip for narrow viewports. Inspector header /
TECHNIQUE row also surface the names.
The TTPsObservedSection rollup tells the operator "we saw T1059" but
not why. Click any technique row → side drawer opens listing every
ttp_tag row in scope with the persisted evidence JSON, firing
rule_id / rule_version, source_kind / source_id, confidence, and
created_at. Mirrors the CredentialReuseInspector / BountyInspector
pattern (drawer-backdrop + bd-head/bd-body + kvs grid).
Backend:
- New `GET /api/v1/ttp/tags/by-{scope}/{uuid}/{technique_id}`
(`scope ∈ {identity, attacker, session}`, optional
`?sub_technique_id=`, `?limit=` capped to 1000). Returns raw
TTPTag rows newest-first.
- New `TTPTagDetailRow` Pydantic model + re-export.
- New repo method `list_tags_by_scope_and_technique` on
TTPMixin (+ abstract on BaseRepository) — single query branched
on scope; identity scope projects through `Attacker.identity_id`
the same way `list_techniques_by_identity` does.
- Tests: evidence round-trips, sub_technique filter, JWT-required,
empty scope, unknown scope rejected.
Frontend:
- New `TTPInspector.tsx` + `TTPInspector.css` (violet accent, slide
animation, focus-trapped panel matching the existing inspector
family).
- `TTPsObservedSection`'s TechniqueBar is now click+keyboard
activatable; clicking opens the inspector for that
(technique, sub_technique) tuple.
mypy clean. 532 passed in the targeted sweep.
The endpoint was a contract-phase stub returning `[]` even though the
RuleStore loaded all 58 YAML rules at worker startup. UI saw an empty
table; operators couldn't tell whether anything was wired up.
- `api_list_rules` now calls `get_rule_store().load_compiled()` and
serializes each CompiledRule + its operational state into a
RuleCatalogueRow. Sorted by rule_id for stable golden snapshots.
- Add `description: str` to RuleSchema (pydantic) and CompiledRule
(NamedTuple, defaulted) + propagate through `_compile_one` so the
catalogue surfaces the human-readable YAML description, not just
the slug-style `name`.
- Update `tests/ttp/test_rule_engine.py` _fields assertion for the
new column; new `tests/api/ttp/test_rules_catalogue.py` pins the
catalogue contents (R0001/R0014 presence, row shape, sort order).
Worker behaviour is unchanged: it was already loading rules
correctly. This is purely a read-side wiring fix on the operator API.
Five GET rollup endpoints (techniques, by-identity, by-attacker,
by-campaign, by-session) and the Navigator export (fleet +
per-identity) now call into the TTPMixin methods. Rule catalogue
endpoint still returns [] — backed by the RuleStore which lands
at E.3.5/E.3.6.
Mounts /api/v1/ttp/* with empty-list / empty-Navigator responses.
GET endpoints viewer-gated; POST/DELETE /rules/{rule_id}/state
admin-gated server-side. POST parses JSON manually so a malformed
body returns the documented 400 (per feedback_schemathesis_400).
Drops xfail-strict markers from E.2.8 tests now that the router is
mounted; 26 tests pass against the contract handlers.
Replace repo: BaseRepository with a structural TopologyRepository protocol
in persistence.py and allocator.py. All read methods now return typed DTOs
(TopologySummary, LANRow, DeckyRow, EdgeRow) instead of raw dicts, eliminating
silent field-shape regressions across the topology subsystem.
TopologySummary gains email_personas and language_default so api_personas.py
can continue reading those fields via attribute access. hydrate() converts
DTOs to dicts before passing to _backfill_decky_configs, keeping the mutable
working-state function dict-based at its boundary. All production callers
(router handlers, mutator, CLI, heartbeat) migrated from dict/get access to
attribute access. 134 tests pass.
await inside a threading.Lock yields to the event loop while the OS
thread still holds the lock — potential deadlock under FastAPI thread
pool dispatch. asyncio.Lock is the correct primitive for async
critical sections. Also fixed stale diurnal.py docstring that had the
delegation direction backwards.
_parse_weights was silently dropping content_class values that don't
belong on their target list with no operator feedback. Changed it to
return (weights, dropped), apply_payload to collect and return all
dropped names, and put_config to include dropped_entries in the
response when non-empty.
get_config was calling planner.apply_payload on every GET request, racing
concurrent reads on module-level globals. Added a _hydrated flag + lock
so DB hydration runs at most once per process lifetime; put_config marks
it done too. Test fixture resets the flag between tests.
ServiceNotFoundError (→ 404) and ServiceConflictError (→ 409) replace the
"not found" / "already on" / "not on" substring checks in _map_mutation_error;
base ServiceMutationError still maps to 422. Fixes three pre-existing test
status-code assertions (201 vs 200 on POST endpoints).
Pure tarball construction (_build_tarball, _render_*, _iter_included,
_SYSTEMD_UNITS) moved to decnet/swarm/bundle_builder.py — no FastAPI
dependency, independently testable. EnrollBundleRequest/Response moved
to decnet/web/db/models/swarm.py alongside the other swarm DTOs.
Router drops from 504 to 260 lines; keeps only the in-memory token
registry, sweeper, and endpoints.
- DeckyServiceAddRequest gains an optional `config: dict` field, validated
against the service's config_schema before any state mutation (400 on
bad type, no half-written rows).
- Engine: add_service threads `config` into _add_topology_service /
_add_fleet_service, persisting validated cfg to decky_config.service_config
BEFORE compose regen so the first `up -d --build` materialises the env on
the new container. No follow-up apply needed.
- Frontend: shared AddServiceConfigModal — same wizard accordion shape, used by:
* DeckyCard's ADD SERVICE picker (Fleet & MazeNET inspectors via shared component)
* MazeNET Inspector's ADD SERVICE picker
* MazeNET palette drag-drop onto a deployed decky
Empty-schema services short-circuit to a one-click add (no modal flash).
Operator can cancel; errors surface in the modal.
- Tests: add_service config plumbing — persist, drop unknown keys, 400-equivalent
on bad types, back-compat empty-config.
- Drive-by: fix stale repo-method names in test_services_live.py
(create_topology_decky → add_topology_decky, get_topology_decky → list+pick helper,
service.added → service_added topic).
- GET /topologies/services/{name}/schema serves the declared ServiceConfigField
metadata so the Inspector can auto-render forms.
- PUT /(topologies/{id}/)deckies/{decky}/services/{svc}/config persists the
validated dict (DB + compose); container untouched (Save).
- POST /(topologies/{id}/)deckies/{decky}/services/{svc}/apply persists then
force-recreates <decky>-<svc> so the new env takes effect (Apply, destructive).
- New engine helper update_service_config wires both fleet and topology paths
through the existing _persist_fleet_change / _rerender_topology_compose
machinery; emits decky.<name>.service_config_changed on the bus.
Bus topic segments are NATS-style tokens and the validator at
bus/topics.py:402 rejects '.', '*', '>', whitespace. My W3 constants
'service.added' / 'service.removed' tripped this on every live
add/remove call:
ValueError: topic segment 'service.added' may not contain '.', ...
Renamed both to underscore form: DECKY_SERVICE_ADDED = 'service_added'.
Aligned the SSE forwarder's name mapping (decky.<name>.service_added →
SSE event 'decky.service_added') and the frontend's
useTopologyStream listener + MazeNET.tsx event handler. Also updated
the wiki entry with a note about the underscore.
The /topologies/{id}/events SSE proxy now subscribes to two bus
patterns concurrently and merges them through a bounded asyncio.Queue:
* topology.{id}.> — lifecycle (status, mutation.*) — unchanged.
* decky.> — per-decky events, filtered by payload.topology_id
so a fleet decky sharing a name with a topology
decky doesn't leak across.
_sse_name_for routes 'decky.<name>.service.added' to the SSE event
name 'decky.service.added' (kept the prefix so the frontend doesn't
collide with topology lifecycle events that share leaf names like
'status').
useTopologyStream surfaces the two new event names; MazeNET.tsx's
onStreamEvent optimistically patches the matching node's services
list so a second tab reflects shape changes without a refetch.
Adds a fleet_singletons array to ServiceCatalogResponse so per-decky
add UIs can filter out services like LLMNR that run once fleet-wide
(and would 422 server-side at the live add endpoint).
The existing 'services: list[str]' field is unchanged for back-compat
with MazeNET/useMazeApi.ts:257; the new field is additive.
decnet_web/src/hooks/useServiceRegistry.ts wraps the endpoint with a
module-scoped cache (registry only changes on BYOS install / plugin
drop, neither of which happens mid-session) and exposes a precomputed
.perDecky list so consumers don't need to re-derive the diff.
decnet.engine.services_live exposes add_service / remove_service for
both fleet and topology decky scopes. The host's _compose() wrapper
already supported per-service targeting (up --no-deps -d <svc>,
stop, rm -f); what was missing was the orchestration around it:
* add: validate against decnet.services.registry (rejects unknown +
fleet_singleton); persist the new services list; re-render the
per-scope compose file (so future redeploys reflect the change);
run docker compose up -d --no-deps --build <decky>-<svc>.
* remove: stop + rm -f the service container; persist; re-render
compose so a future up -d doesn't bring it back.
Both publish decky.<name>.service.added / .removed on the bus, with
the post-mutation services list. Topic constants added to
decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md
ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored).
Four new admin endpoints:
* POST/DELETE /api/v1/deckies/{name}/services{,/svc}
* POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc}
ServiceMutationError messages are mapped at the API boundary to 404
(decky/topology missing), 409 (idempotency violation), 422 (unknown
or fleet_singleton service).
Extracts the docker-exec-with-base64-stdin pattern out of canary/planter
and orchestrator/drivers/ssh into a shared decnet.decky_io package.
Both consumers now delegate; the canary planter test still proves the
contract end-to-end.
Adds POST/DELETE /api/v1/deckies/files for arbitrary file drops.
Container resolution is shared with the canary path: topology_id absent
means fleet (<name>-ssh), present routes through resolve_decky_container
which picks <name>-ssh when the topology decky exposes ssh, else the
topology base container decnet_t_<id8>_<name>.
Path validation rejects relative paths and '..' traversal at the request
model layer. Bad base64 → 400; unknown topology → 404; decky not in
topology → 422; docker exec failure → 409.
POST /api/v1/canary/tokens grows an optional topology_id field. When
present, the server hydrates the topology, validates the named decky is
in it, and resolves the docker container via
planter.resolve_topology_container — <name>-ssh if the decky exposes ssh,
else the topology base container. Absent ⇒ fleet semantics, unchanged.
The token row gets a nullable topology_id column (no migration helper
per pre-v1 policy). GET /api/v1/canary/tokens accepts ?topology_id= as
a filter. DELETE re-resolves the container at revoke time so a
redeployed topology is still reachable.
422 when the named decky isn't in the topology; 404 when the topology
itself doesn't exist.