Compare commits

551 Commits

Author SHA1 Message Date
f47dd4b520 docs: update README to reflect current codebase state
Some checks failed
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Merge testing → main (push) Has been skipped
CI / Lint (ruff) (push) Successful in 20s
CI / SAST (bandit) (push) Successful in 32s
CI / Dependency audit (pip-audit) (push) Failing after 1m25s
CI / Merge dev → testing (push) Has been skipped
Rewrites the architecture section for the full current module tree and adds
new sections for the REST API, swarm/agent mode, service bus, attacker
intelligence stack (profiler, clustering, correlation, GeoIP/ASN),
MazeNET topology, canary tokens, and TTP tagging/export. Updates the CLI
reference table, test count (478 → 5050), and Python version constraints.
2026-05-26 00:57:40 -04:00
f2b3393669 chore: relicense to AGPL-3.0-or-later and add SPDX headers
Replaces LICENSE (GPLv3 -> AGPLv3) and prepends
`SPDX-License-Identifier: AGPL-3.0-or-later` to every source file
across decnet/, decnet_web/, tests/, scripts/, and tools/.

Rationale: closes the GPLv3 ASP loophole so any party operating a
modified DECNET as a network service must offer their modified
source. Personal copyright (Samuel Paschuan) + inbound=outbound
contributions make a future unilateral relicense infeasible.

- LICENSE: full AGPL-3.0 text (gnu.org/licenses/agpl-3.0.txt)
- COPYRIGHT: project copyright notice
- tools/add_spdx_headers.py: idempotent header injector
  (shebang- and PEP 263-aware)

Touches 1565 source files (.py, .ts, .tsx, .js, .jsx, .css, .sh).
No behavior change; comments only.
2026-05-22 21:04:16 -04:00
ee10b55cfe fix(engine): per-scope docker compose project names
Every compose invocation used -p decnet so fleet + every topology
lived in one docker compose project. --remove-orphans, run during
fleet pre-up cleanup and on every topology teardown / rollback, then
swept every container in the project not listed in the current compose
file — wiping sibling topologies and the flat fleet along with the
intended target.

Parameterize project on _compose / _compose_with_retry / _compose_ps
(default FLEET_COMPOSE_PROJECT="decnet"). Add _topology_compose_project
that returns decnet-topo-<id8>, and pass it through every topology
compose call site (master deploy_topology + rollback + post-deploy ps,
master teardown_topology, agent apply, agent teardown, all four live
service mutations on topology deckies). Fleet calls keep the default
and are unaffected.

Migration: live containers from before this fix remain in the shared
"decnet" project and need a one-time manual cleanup before they're
reachable to the new topology code paths.
2026-05-22 18:29:33 -04:00
1b90048715 fix(api): /deckies/deploy becomes additive by default
The wizard POSTs only the new decky on each submit. The handler used to
treat every INI as the complete desired fleet (config.deckies = INI) so
the reconciler tore down prior deckies as orphans — deploying a second
Windows workstation silently wiped the first.

Add replace_fleet to DeployIniRequest (default false). Default path
merges new deckies into existing config and rejects name/IP collisions
with 409. replace_fleet=true preserves set-desired-state semantics for
CLI / declarative callers. Lifecycle rows are created only for the
deckies submitted in the current call, so /deckies/lifecycle?ids=...
reflects exactly what this submit deployed.

build_deckies_from_ini gains reserved_ips so additive auto-allocation
skips IPs already held by the existing fleet.
2026-05-22 18:14:50 -04:00
5b13a01ab6 feat(web): deploy wizard polls async lifecycle instead of holding HTTP
- New useLifecyclePolling(ids, intervalMs) hook: polls
  GET /deckies/lifecycle?ids=... every 2s until every row is terminal,
  surfaces transient HTTP failures without giving up.
- DeployWizard: drops the 180s axios timeout and the fake-log-driven
  deployOk flag. After POST 202, sets lifecycle_ids -> the hook drives
  the per-decky pill grid (PENDING / RUNNING / SUCCEEDED / FAILED).
  Real terminal lines stream into the log as rows resolve. Auto-close
  on all-success after 700ms.
- DeckyFleet.css: .lifecycle-grid + .lifecycle-pill in the existing
  fleet vocabulary; running pill pulses, failed pill borders alert.
- Existing 4 wizard render tests still pass; 4 new hook tests cover
  empty ids / single-success / polling-until-terminal / HTTP error.
2026-05-22 17:50:26 -04:00
eacac9aa60 feat(api): GET /deckies/lifecycle + master startup sweep
GET /deckies/lifecycle?ids=<uuid>&ids=<uuid> returns the matching
DeckyLifecycle rows so the wizard can poll instead of holding an HTTP
request open across compose work. require_viewer gating -- read-only.

Startup sweep: on master boot, any pending/running row with
started_at older than 1h flips to failed with
error='master restarted during operation'. Pre-v1 substitute for a
durable task queue: if the master crashes mid-deploy, the wizard sees
FAILED on refresh and the operator retries. Idempotent + cheap; runs
unconditionally including in contract-test mode.
2026-05-22 16:44:17 -04:00
4743c8f733 feat(api): /deckies/deploy and /mutate become 202 fire-and-forget
This is the unblock for the wizard hang. Both endpoints used to run
docker compose synchronously inside the HTTP handler -- on master
(unihost) or via asyncio.gather of worker /deploy POSTs at 600s
timeout each (swarm) -- blocking every other API request.

New flow:
  1. Commit the new config shape to repo state (fast).
  2. Create one DeckyLifecycle row per decky (status=pending).
  3. Spawn asyncio.create_task(run_deploy / run_mutate) -- the
     lifecycle runner drives rows through running -> succeeded|failed
     and emits decky.<name>.lifecycle on the bus.
  4. Return 202 with {lifecycle_ids: [...]}. Wizard polls
     GET /deckies/lifecycle?ids=... (next commit).

mutator/engine.py gains pick_new_services() -- shared between the
async API path and the watch-loop's synchronous mutate_decky().

DeployResponse grows lifecycle_ids[]. The old dispatch_decnet_config
helper still exists for the CLI swarm-deploy command path; it just
isn't called from the API handler anymore.

Test changes: 200 -> 202, drop dispatch_decnet_config mocks (handler
no longer calls it), assert lifecycle_ids in response + committed
state matches expectations.
2026-05-22 16:40:55 -04:00
e5e2bec3aa feat(swarm): heartbeat handler applies lifecycle deltas
HeartbeatRequest grows an optional lifecycle field carrying per-decky
completion records from the worker:
  [{decky_name, operation, status, error?, completed_at?}]

For each delta, the master finds the most-recently-started open
DeckyLifecycle row for (decky_name, operation, host_uuid) and flips
it to terminal with the worker's error text + timestamp. Stale
duplicates (row already sealed or never existed) are logged and
dropped -- not errors.

Each successful pivot also emits decky.<name>.lifecycle on the bus
so the dashboard sees the transition without waiting for its next
poll tick.

This is the master-side completion channel for the worker's 202
fire-and-forget /deploy and /mutate.
2026-05-22 16:33:48 -04:00
d1ca96b2f4 feat(agent): /deploy and /mutate become 202 fire-and-forget
The wizard API used to hang because /deckies/deploy ran docker compose
build && up -d synchronously, holding the request thread for minutes.
The worker side of that pipeline now returns 202 Accepted immediately
and runs the deploy in an asyncio.create_task.

On task completion (success or failure) the worker pushes a one-off
heartbeat carrying a lifecycle delta per decky:
  {decky_name, operation, status: succeeded|failed, error?, completed_at}

Master pivots these onto open DeckyLifecycle rows in the heartbeat
handler (next commit).  The scheduled 30s heartbeat tick is the
fallback if the immediate push drops.

- decnet/agent/app.py: /deploy and /mutate return 202; dry_run mutate
  still validates synchronously and returns 200.
- decnet/agent/executor.py: deploy_async + mutate_async wrap the work
  and push the completion delta.
- decnet/agent/heartbeat.py: push_lifecycle_delta() helper builds a
  one-off body and POSTs with the same mTLS context as the loop.
- decnet/swarm/client.py: revert deploy/mutate to control timeout
  (master no longer holds the HTTP request open for compose work).

Worker state.json gains no lifecycle field -- master DeckyLifecycle is
the source of truth; the master sweep handles crashed-mid-deploy
recovery.
2026-05-22 16:31:23 -04:00
c0ad380020 feat(lifecycle): runner + strategies + bus topic
Add decnet.lifecycle package: pure orchestration layer that the
master API will invoke via asyncio.create_task to drive DeckyLifecycle
rows through pending -> running -> succeeded | failed without
holding an HTTP request open.

Strategy classes per (operation, transport):
- LocalDeployStrategy: master-resident, runs engine.deployer.deploy
  in a thread.
- SwarmDeployStrategy: shards by host_uuid, dispatches via
  AgentClient.deploy; worker drives terminal via heartbeat.
- LocalMutateStrategy: write_compose + compose up.
- SwarmMutateStrategy: AgentClient.mutate; worker drives terminal.

decnet.bus.topics gains decky_lifecycle(name) -> decky.<name>.lifecycle
plus DECKY_LIFECYCLE constant. Payload documented in the wiki
(separate commit). publish_safely keeps bus best-effort.

Nothing is wired to call this yet -- next commits convert worker
/deploy /mutate to 202, then heartbeat delta wiring, then master API.
2026-05-22 16:25:33 -04:00
05c0721a51 feat(db): add DeckyLifecycle table for async deploy/mutate tracking
One row per (decky, operation) attempt. State machine:
pending -> running -> succeeded | failed (+ error text). Rows are
append-only after terminal; retries write a new row.

Sibling of DeckyShard rather than a rework -- DeckyShard tracks
runtime container state observed via heartbeat, this tracks
operation lifecycle. New table, UUID PK.

Adds BaseRepository abstract methods (create_lifecycle,
update_lifecycle, get_lifecycle_by_ids, find_open_lifecycle,
sweep_stale_lifecycle) with SQLModelRepository mixin impl.
Backbone for the upcoming 202-Accepted async API.
2026-05-22 16:20:00 -04:00
ade8bbe30a feat(agent): real worker-side /mutate with master swarm dispatch
- Implement /mutate handler: load_state, update services + last_mutated,
  save_state, write_compose, compose up -d via asyncio.to_thread. 404
  for missing state / unknown decky_id. dry_run short-circuits before
  any side effect.
- Add AgentClient.mutate(decky_id, services, *, dry_run=False) using
  _TIMEOUT_DEPLOY (compose up can pull/build, exceeds control timeout).
- mutator/engine.py: in swarm mode with decky.host_uuid set, resolve
  worker via _resolve_swarm_host and dispatch through AgentClient.mutate
  instead of writing a compose file on master. Master-resident deckies
  (unihost mode, or swarm with host_uuid=None) keep the local path.
2026-05-22 16:14:46 -04:00
418245f9b4 chore(deps): bump starlette to 1.0.1 (PYSEC-2026-161) 2026-05-22 16:14:35 -04:00
8eccb260be feat(dns-service): expose DNS_STATE_PATH config field
Adds state_path ServiceConfigField and passes DNS_STATE_PATH into the
container environment. Operator must mount the parent directory on a
volume for persistence to survive container recreation.
2026-05-21 22:10:43 -04:00
757aff4671 feat(dns): persist tunneling burst state across restarts
Switch burst deque from monotonic() to time.time() (wall-clock, serializable).
Add DNS_STATE_PATH env var: on startup _load_state() reads {src:[ts,...]} JSON
and prunes entries older than the burst window. _flush_state() write-then-renames
atomically; _state_flusher() coroutine flushes every 5s when dirty. Detection of
the 5th event also triggers an immediate flush. No-op when DNS_STATE_PATH is
unset, so the default deployment is unchanged.
2026-05-21 22:10:10 -04:00
457e2d990c feat(dns): count NULL/CNAME/AAAA/PRIVATE in tunneling burst window
Rename _txt_times -> _tunnel_times. Add TYPE_CNAME=5, TYPE_NULL=10,
TYPE_PRIVATE=65399 constants. Guard burst counter with _TUNNEL_QTYPES
frozenset instead of TYPE_TXT only. Mixed-type queries from one source
now share a single burst window, closing iodine NULL/CNAME downlink
and AAAA-encoded uplink evasion gaps.
2026-05-21 22:07:58 -04:00
9e3473b370 feat(dns): full-subdomain entropy check catches short-label exfil
_is_tunneling now returns str|None (the detection method) instead of bool.
Two new tunables _QNAME_TOTAL_LEN_THRESHOLD=50 and _QNAME_ENTROPY_THRESHOLD=3.5
catch attackers who split a high-entropy payload across multiple short labels.
tunnel_method field added to tunneling_suspect events for downstream correlation.
2026-05-21 22:06:14 -04:00
a6b5b1a7f8 feat(dns): full EDNS sub-option parsing and NSID request detection
_parse_edns_size only extracted the requestor UDP size; every other field in
the OPT record (DO bit, EDNS version, extended RCODE, all sub-options) was
invisible.  Replaced with _parse_opt_record returning a full dict:
  udp_size, ext_rcode, version, do_bit, z, options[(code, len, data)]

NSID request (option code 3) is now detected as fingerprint_probe with
probe=edns_nsid and contributes to recon_burst.  DO bit, COOKIE (10), and
other options are not escalated; udp_size continues to drive amp_probe.
2026-05-21 21:20:57 -04:00
4dadeb9aba feat(dns): detect non-zero OPCODE and anomalous header-flag combinations
Tools like fpdns send OPCODE=IQUERY/STATUS/NOTIFY/UPDATE or set the reserved
Z bit to fingerprint resolver behaviour.  Previously all these were parsed as
standard queries with no signal.

  - opcode!=0 → fingerprint_probe probe=opcode_<name>, NOTIMP response;
    fired before qdcount check so qdcount=0 UPDATE packets are still caught.
  - Z bit set OR (AD+CD without RD) → fingerprint_probe probe=header_flags;
    AD alone with RD is ignored to avoid tagging DNSSEC-aware stubs.
  - Both variants contribute to recon_burst.
2026-05-21 21:19:01 -04:00
35159419bb feat(dns): detect CLASS=ANY queries as fingerprint_probe
qclass=255 in a standard query is unusual enough to be a fingerprinting probe
(fpdns, various scanner scripts).  Previously it was logged as a plain query
with qclass=ANY in the event field; now it emits fingerprint_probe with
probe=qclass_any and returns REFUSED — consistent with how we treat other
probe types.  Contributes to recon_burst.
2026-05-21 21:16:47 -04:00
521d77b28f feat(dns): hoist CHAOS probe map to module level, add authors.bind. entry
The inline probe_map dict inside _handle made tests blind to the probe
catalogue and couldn't be extended without touching the hot path.  It is now
module-level _CHAOS_PROBE_MAP.  authors.bind. joins the three existing entries
so it gets named correctly instead of carrying the raw qname.
2026-05-21 21:15:58 -04:00
629f969eb6 feat(dns): emit multi_question event when qdcount>1
Packets with multiple questions were silently parsed at q0 only; the extra
questions were invisible.  Now emits multi_question at severity=5 with the
qdcount and q0 qname, then falls through and answers q0 normally.
2026-05-21 21:14:50 -04:00
db798f5a5b feat(dns): emit events on malformed/headerless/question-parse-error packets
Silent drops on <12B packets, qdcount=0, and question-section ValueError gave
fuzzers and scanners a completely dark target.  New events malformed_packet,
empty_question_section, and question_parse_error fire at severity=5 so these
probes are visible without counting toward recon_burst.
2026-05-21 21:13:46 -04:00
da2ad7a82a feat(dns): global upstream forward rate limit with sinkhole fallback
Adds DNS_FORWARD_BUDGET (default 50) and DNS_FORWARD_WINDOW (default 1.0s)
env vars. _can_forward() maintains a rolling deque of upstream call
timestamps; queries that exceed the budget within the window are answered
with the sinkhole (127.x) instead of being forwarded, making the honeypot
ineligible as a sustained amp vector even when real_recursive is enabled.
Rate limit is global (not per-source) so IP-spoofed amplification floods
hit the ceiling regardless of how many source addresses are rotated.
2026-05-21 20:50:20 -04:00
e5847b7e1e feat(dns): real recursive forwarding with sinkhole fallback
When DNS_REAL_RECURSIVE=true and DNS_ZONE_MODE=recursive, out-of-zone
queries are forwarded to DNS_UPSTREAM (default 8.8.8.8:53) via async
UDP. Upstream response is relayed as-is; on timeout or error the
already-computed sinkhole (127.x) is returned instead.

_handle() always runs first so logging, tunneling detection, flood
tracking, and recon-burst aggregation fire on every query regardless
of whether the response ultimately comes from upstream. _dispatch()
overlays forwarding on top of the sync handler.

Protocol handlers (UDP datagram_received, TCP session) are now async
via asyncio.ensure_future / await _dispatch(). Service class exposes
real_recursive (bool) and upstream (string) config fields.
2026-05-21 20:49:19 -04:00
8f33f1b849 fix(dns): recursive mode now returns sinkhole A answer, not NXDOMAIN
RA=1 + empty answer section is immediately detectable as fake by any
open-resolver scanner. Recursive mode now behaves like open mode
(127.0.0.x sinkhole, deterministic on qname) with RA=1 and AA=0,
matching what a real recursive resolver returns.
2026-05-21 20:40:27 -04:00
bbb126e435 feat(dns): fix three operational blind spots — flood detection, AAAA, recon burst
- Add per-src QPS counter (_qps_window) with flood_suspect event at ≥50 qps/10s;
  one event per src per 30s cooldown, does not suppress baseline query events.
- Add tracking_evicted telemetry every 100 LRU evictions so IP-rotation evasion
  of _txt_times/_qps_window/_recon_window is observable, not silent.
- Shared _track_lru helper consolidates LRU touch + eviction signalling across
  all three bounded OrderedDicts.
- Add TYPE_AAAA=28 support: _fake_ipv6() returns deterministic ULA (fd::/8)
  addresses for in-zone names; extra_records parser now accepts and validates
  AAAA entries via socket.inet_pton.
- Add per-src recon-burst aggregation (_recon_window): fingerprint_probe +
  zone_transfer + amp_probe are tracked per source in a 60s window; recon_burst
  fires when ≥2 distinct signal types seen, once per src per 120s cooldown.
- 47 tests passing (19 new across TestAAAARecords, TestFloodDetection, TestReconBurst).
2026-05-21 19:50:09 -04:00
77a466e615 feat(dns): add BIND-flavored DNS honeypot service
Python asyncio DNS server on UDP+TCP/53 masquerading as BIND 9.x.
Emits four event_type values: query, fingerprint_probe (version.bind /
hostname.bind / id.server CHAOS), zone_transfer (AXFR/IXFR, always
REFUSED), amp_probe (qtype=ANY or EDNS udp_size>1232), and
tunneling_suspect (long high-entropy labels or rapid TXT burst).

Zone persona is generated per-decky from instance_seed (domain name,
SOA serial, NS, A, MX, TXT SPF); overridable via config_schema.
Three zone modes: auth (default), recursive, open (sinkhole).
2026-05-21 19:07:49 -04:00
72cdeb3270 chores: deleted some trash and updated the development roadmap 2026-05-21 16:21:54 -04:00
e292fd7d05 feat(web): surface bgp_prefix and rpki_status in AttackerDetail and export
AttackerData type gets bgp_prefix / rpki_status / rpki_source.
TimelineSection renders prefix inline next to AS number; RPKI status
shows as a green RPKI VALID / red RPKI INVALID badge, or dim
NO ROA for not-found. rpki-status-badge CSS added to Dashboard.css.
Export network block extended with the three new fields.
2026-05-21 16:17:38 -04:00
e1eda1e754 feat(profiler): wire enrich_rpki into _build_record
Import enrich_rpki from decnet.rpki and call it inline after the
ASN lookup. bgp_prefix, rpki_status, rpki_source added to the
record dict that feeds the Attacker upsert. enrich_rpki short-circuits
to (None, None) when asn is None, so private / unannounced IPs
never hit RIPE STAT.
2026-05-21 16:14:51 -04:00
49b4996956 feat(model): add bgp_prefix, rpki_status, rpki_source to Attacker
bgp_prefix (max 43 chars, indexed) holds the covering CIDR from
the ASN lookup. rpki_status / rpki_source hold RIPE STAT validation
outcome. All nullable — null means enrichment was skipped or ASN
did not resolve.
2026-05-21 16:13:31 -04:00
b799ade816 feat(rpki): ripestat validator + sqlite cache
RipeStatValidator makes two RIPE STAT calls per uncached IP:
network-info -> announced prefix, rpki-validation -> ROA state.
2-second timeout; any network failure returns status='unknown'.

SQLite cache keyed by IP, 12-hour TTL, pruned on validator init.
Cache avoids per-event HTTP for the high-churn attacker pool —
steady-state cost approaches zero for repeat offenders.
2026-05-21 16:13:01 -04:00
1a11287f76 feat(rpki): provider scaffold — base, factory, paths, ripestat skeleton
New decnet/rpki/ module mirrors decnet/asn/ shape. Validator ABC,
lazy singleton factory (DECNET_RPKI_PROVIDER=ripestat default),
paths.py with DECNET_RPKI_ROOT override. RipeStatValidator stub
returns 'unknown' unconditionally — HTTP wired in next commit.

enrich_rpki(ip, asn) -> (status, source) | (None, None); short-circuits
on DECNET_RPKI_ENABLED=false or asn=None.
2026-05-21 16:10:01 -04:00
e3d9908bed feat(asn): expose BGP prefix in AsnInfo and enrich_ip
Synthesize the covering CIDR at lookup time from the matched iptoasn
range using ipaddress.summarize_address_range. AsnInfo.prefix is
populated per-query; not persisted in the pickle cache.

enrich_ip now returns (asn, as_name, bgp_prefix, provider_name).
Profiler worker updated to unpack the 4-tuple and write bgp_prefix
into the attacker record dict.
2026-05-21 16:07:57 -04:00
f160eccdae fix(ui): scope empty-state color and font to dashboard context — suppress matrix-green bleed 2026-05-21 15:44:08 -04:00
cd3c1104b4 fix(ui): replace ad-hoc fingerprints empty state div with EmptyState component 2026-05-21 15:40:10 -04:00
28f26cc5f3 fix(ui): replace info-banner empty state in BehaviouralPrimitivesPanel with EmptyState 2026-05-21 15:23:33 -04:00
946636d8f4 feat(ui): wire icmp_error / icmp6_error fingerprint probes into AttackerDetail 2026-05-21 15:12:39 -04:00
2af46ed102 feat(ingester): promote icmp_error / icmp6_error probe fields to fingerprint bounties 2026-05-21 15:10:07 -04:00
3f8170be10 feat(prober): add Icmp6ErrorProbe — ICMPv6 error-leakage fingerprint
Four RFC 4443 stimuli (port-unreach, hop-limit-exceeded, unknown-NH,
bad-dest-option) produce a 4-char matrix + sha256 fingerprint for IPv6
attackers. Auto-registers via ActiveProbeMeta at priority=860 (after v4
icmp_error=850, before ipv6_leak=999). IPv4 targets fast-return None.
2026-05-21 15:03:10 -04:00
56229a272b feat(prober): add IcmpErrorProbe — ICMP error-leakage fingerprint
Sends four crafted stimuli (UDP/closed-port, TTL=1, DF+oversized,
bad IP option) and records which ICMP error classes come back, the
per-error RTT, and the bytes echoed in each ICMP body. Absence is
as informative as a reply — Linux rate-limiting is a fingerprint signal.

Returns None when no packets could be sent (no CAP_NET_RAW), so the
probe is a no-op in non-root test environments. Port-free ActiveProbe
subclass (priority=850), metaclass auto-registered in the registry.

Also fixes three sets of stale tests left over from the TlsCertProbe
migration (4b2759e0):
- test_active_probe_registry: closed name/order sets updated for
  tls_certificate and icmp_error
- test_prober_rotation: dead patches on worker.fetch_leaf_cert removed
- test_prober_worker (TestProbeCycleTLSCert): rewritten to test
  TlsCertProbe as an independent registry probe, patch target updated
  from worker.fetch_leaf_cert to probes.tlscert_probe.fetch_leaf_cert
2026-05-21 14:52:49 -04:00
4b2759e0fc refactor(prober): absorb TlsCertProbe into ActiveProbe registry
TLS cert capture was the last prober special-case that bypassed
ActiveProbeMeta. Moves logic into TlsCertProbe (priority=200, runs
after JARM) in probes/tlscert_probe.py; drops _capture_tls_cert,
the probe.probe_name=="jarm" name-check, and the direct
fetch_leaf_cert import from worker.py.
2026-05-21 14:32:07 -04:00
bd4700770b refactor(prober): generalise ActiveProbe registry to absorb Ipv6LeakProbe
ActiveProbe.run/syslog_fields/publish_payload now accept port=None so
non-port-iterating probes can live in the registry. Ipv6LeakProbe replaces
the hand-rolled _ipv6_leak_phase special case in worker.py; it runs last
via priority=999. _probe_cycle no longer has an ad-hoc phase call.

Fixes three stale test files (test_prober_bus, test_prober_rotation,
test_prober_worker) that were broken since the 916b21b6 registry refactor.
2026-05-21 14:27:48 -04:00
b80e621904 fix(prober): consolidate ip route get to single call + log bare excepts
_route_info() calls _ip_route_get once and returns (on_link, iface);
worker._ipv6_leak_phase now calls it instead of the two separate helpers.
Bare except clauses at _ip_route_get and response parse now log at debug.
2026-05-21 14:16:42 -04:00
1123e50325 fix(sniffer): add missing syslog_bridge.py to template build context 2026-05-20 22:22:47 -04:00
6865abcff9 chore(make): drop cowrie from build-all (legacy, unused) 2026-05-20 22:20:09 -04:00
dee208ad25 feat(make): add build-all to pre-build all 28 decky template images
Iterates every template with a Dockerfile, builds decnet/<svc>:latest
with DOCKER_BUILDKIT=1. Supports NO_CACHE=1 and FAIL_FAST=0 flags,
mirrors the style of test-all. Updated help target.
2026-05-20 22:19:40 -04:00
a0f10d2c00 feat(ui): add renderers for ja4h, http2/3 settings, ja4-quic fingerprints
FingerprintGroup switch fell through to FpGeneric (raw JSON dump) for all
four new fingerprint_type values the ingester now produces. Add FpJa4h,
FpHttpSettings, FpJa4Quic components and wire them into the dispatcher;
also register their labels and icons in fpTypeLabel/fpTypeIcon.
2026-05-20 22:15:02 -04:00
7bac3a29c6 fix(ingester): retry get_state on startup DB errors; bump deps + rename behave packages
ingester: wrap bootstrap get_state() in forever-retry loop — MySQL coming
up after the API process killed the ingestion task permanently before it
ever entered _run_loop. Regression test added.

deps: idna 3.13→3.15 (CVE-2026-45409), twisted 26.4.0rc2→26.4.0
(PYSEC-2026-160), pip 26.1→26.1.1 (CVE-2026-3219 resolved upstream),
behave-core/behave-shell renamed from decnet-behave-* and bumped to 0.1.1.
pre-commit hook updated to reflect current ignore list.
2026-05-20 22:10:15 -04:00
916b21b652 refactor(prober): ActiveProbe ABC + ActiveProbeMeta registry
Replace _jarm_phase / _hassh_phase / _tcpfp_phase boilerplate (3×~50
lines of identical port-iteration logic) with a metaclass-registered ABC.
Adding a new port-iterating active probe is now one class + three methods.

- decnet/prober/base.py: ActiveProbeMeta auto-registers subclasses by
  probe_name; ActiveProbe ABC enforces run/syslog_fields/publish_payload
  with env-driven DECNET_PROBE_PORTS_<NAME> port override.
- decnet/prober/probes/{jarm,hassh,tcpfp}.py: concrete probe classes.
- decnet/prober/worker.py: single _run_probe driver replaces the three
  phase functions; _probe_cycle iterates ActiveProbeMeta.all(); drops
  the ports=/ssh_ports=/tcpfp_ports= kwargs from prober_worker.
- IPv6 leak and TLS cert capture stay as special cases (different call
  shapes; intentionally outside the registry).
- tests/prober/test_active_probe_registry.py: registry contents, sort
  order, priority-10 override, ABC contract per probe class.
- tests/prober/test_run_probe_driver.py: dedup, success, None-skip,
  exception, rotation, publish paths for _run_probe.
- tests/prober/test_prober_worker.py: updated patch targets and
  _probe_cycle call sites; port control via monkeypatch.setattr.
2026-05-17 23:16:35 -04:00
3977f06374 feat(ttp/ipv6_leak): wire Ipv6LeakLifter into composite tagger and worker
- Add "ipv6_leak" to KNOWN_SOURCE_KINDS in ttp/base.py
- Register Ipv6LeakLifter(store) in factory.py get_tagger()
- Subscribe worker to attacker.fingerprinted; route by Event.type
  so JARM/HASSH/ipv6_leak share the topic without source_kind collision
- Add bump_attacker_ipv6_leak() to BaseRepository (abstract) +
  TTPMixin (implementation): increments ipv6_leak_count, sets last_ipv6_*
  denorm fields, appends-with-dedup to AttackerIdentity.ipv6_link_local_iids
- Call bump_attacker_ipv6_leak from _process_event after insert_tags
- Add DummyRepo stub + coverage call in tests/db/test_base_repo.py
2026-05-17 20:41:55 -04:00
11d9273c99 docs(bus): document ipv6_leak payload kind on ATTACKER_FINGERPRINTED
Add inline documentation for all known kind= discriminators on the
fingerprinted topic including the new ipv6_leak variant so future
consumers know what fields to expect without reading the prober source.
2026-05-17 20:22:55 -04:00
9056e33962 feat(ttp): Ipv6LeakLifter + R0059 rule for IPv6 link-local opsec failures
Ipv6LeakLifter subscribes to source_kind="ipv6_leak" events from both
the passive sniffer and active prober. Emits T1090 (Proxy) under TA0011
(C2) when fe80:: source address is observed — the attacker's VPN only
tunnels IPv4 so their link-local IID leaks their NIC identity.

Rule R0059 sets base confidence 0.85; iid_kind in the evidence carries
the per-observation strength (eui64 = MAC-derived, deterministic;
stable_privacy = RFC 7217; temporary = RFC 4941).
2026-05-17 20:22:26 -04:00
504340745e feat(prober): active IPv6 link-local solicitation phase
Add ipv6_leak.py with solicit_ipv6_leak() — sends ICMPv6 Echo to
ff02::1 on the attacker's iface and returns fe80:: evidence when a
link-local response arrives. Gated on _is_on_link(): skips when
attacker is behind a router (no L2 adjacency).

Add _ipv6_leak_phase() to worker.py (Phase 4 in _probe_cycle).
Phase runs once per attacker IP per cycle (sentinel at port 0 in
ip_probed["ipv6_leak"]) and publishes kind="ipv6_leak" via publish_fn.

Add list_v6_addrs(iface) to network.py: returns [(addr, scope)] for
all IPv6 addresses on an interface, required for source-routing ICMPv6
from the correct link-local address.
2026-05-17 20:20:19 -04:00
aa833ddda9 feat(sniffer): passive IPv6 link-local leak detection
Add _ipv6_iid_classify() to fingerprint EUI-64 vs stable-privacy IIDs
and derive the MAC OUI from EUI-64-encoded link-local addresses.
SnifferEngine._on_ipv6_packet() observes fe80::/10 sources destined for
known deckies and emits ipv6_link_local_leak syslog + bus events.
on_packet() now dispatches the IPv6 branch before the v4 TCP path.
BPF default widened from "tcp" to "tcp or ip6" so the sniff loop
captures IPv6 frames without config change.
2026-05-17 20:16:29 -04:00
69ecc4cc20 feat(models): add IPv6 link-local leak columns to Attacker + AttackerIdentity
Attacker gains five denormalized cache fields (ipv6_leak_count,
last_ipv6_leak_at, last_ipv6_link_local, last_ipv6_iid_kind,
last_ipv6_mac_oui) mirroring the rotation_count/last_rotation_at pattern.
AttackerIdentity gains ipv6_link_local_iids (JSON list[dict]) for
EUI-64-derived MAC cluster signals that survive VPN/IP rotation.
No ALTER TABLE helpers — direct SQLModel column additions per pre-v1 policy.
2026-05-17 20:12:08 -04:00
b390a35262 feat(ttp): add Ipv6LinkLocalLeakEvidence TypedDict + EVIDENCE_SCHEMA entry
Pins the evidence shape for IPv6 link-local leakage findings. All fields
optional (total=False) so partial observation (passive sniffer vs active
solicitation) fills whatever the vector provides. Lifter lands in a
subsequent commit.
2026-05-17 20:10:51 -04:00
3e6587e073 fix(lint): prefix unused params with _ to silence vulture 80% findings 2026-05-17 20:08:54 -04:00
4586e36d63 fix(test/schema): pin xdist_group to prevent multi-server startup, cap workers at 4
Some checks failed
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Merge testing → main (push) Has been skipped
CI / Lint (ruff) (push) Successful in 1m0s
CI / Dependency audit (pip-audit) (push) Failing after 1m2s
CI / SAST (bandit) (push) Successful in 1m11s
CI / Merge dev → testing (push) Has been skipped
2026-05-16 18:36:26 -04:00
8b3f74b39b fix(deps): pin urllib3>=2.7.0 to resolve CVE-2026-44431 and CVE-2026-44432 2026-05-16 18:26:47 -04:00
0fe9f895d0 feat(test): add test-schema target and SCHEMA_QUICK=1 mode for schemathesis
- Add dedicated test-schema Makefile target (xdist logical, 600s timeout,
  -m fuzz) so schemathesis runs separately from test-fuzz, which was
  spinning up competing uvicorn workers per xdist process
- Exclude all test_schemathesis*.py files from FUZZ_FLAGS via --ignore
- Add schema to _ALL_SUITES between api and fuzz
- Add SCHEMA_QUICK env var (default 0): caps every max_examples to 100
  across all four schemathesis files (4520 -> 600 total examples)
- Fix pre-push hook: use .311 venv and delegate to make test-all FAIL_FAST=0
  instead of hand-rolling five separate pytest invocations
2026-05-16 18:25:40 -04:00
ac332a6ba9 fix(live/mysql): use pytest_asyncio.fixture(loop_scope=module) on mysql_repo
@pytest.fixture on an async fixture ignores loop_scope, so mysql_repo
ran on the per-function loop while mysql_test_db_url's engine was bound
to the module loop — triggering 'Future attached to a different loop'.
2026-05-10 22:45:05 -04:00
e26876ee92 fix(makefile): add -m markers to live/docker/stress/bench targets 2026-05-10 22:43:33 -04:00
6a91858c15 fix(https-template): wire TLS_CERT/TLS_KEY into make_server ssl_context
Server read the env vars but never passed them to make_server, so it
served plain HTTP and the TLS handshake check timed out in live tests.
2026-05-10 22:39:24 -04:00
54dede5077 feat(makefile): add static analysis targets and xdist to SEQ_FLAGS
Add mypy, bandit, vulture, pip-audit as Makefile targets and include
them in test-all. Also enable -n logical on SEQ_FLAGS so live/api/stress
suites run in parallel where async-safe.
2026-05-10 22:37:30 -04:00
b41a7e3115 fix(live tests): use @pytest_asyncio.fixture for module-scoped async fixtures 2026-05-10 22:30:56 -04:00
ab18cd7797 fix(live tests): replace deprecated event_loop fixture with loop_scope="module" on async fixtures 2026-05-10 22:29:57 -04:00
0403cfc6a2 perf(pytest): switch xdist workers from -n 4 to -n logical 2026-05-10 22:28:04 -04:00
349f88252a chore: add Makefile with per-suite test targets; gitignore ATT&CK bundle and pytest dump 2026-05-10 22:27:54 -04:00
59d3351306 fix(fleet): strip digest from build_base tag before APT compatibility check; mark wizard done 2026-05-10 22:27:47 -04:00
80fff1efa4 fix(web): coerce fingerprint_type to string; sync frontend types and tests 2026-05-10 22:27:38 -04:00
a009746dd1 feat(fingerprint): extend syslog_bridge with HTTP/3 and JA4H fingerprinting emission 2026-05-10 22:27:22 -04:00
52f2f65fa3 fix(tests): fix stale asyncio.sleep patches and missing tarpit guards in service isolation tests
After the ingester._sleep alias fix, three tests in test_service_isolation.py
still patched `decnet.web.ingester.asyncio.sleep` (the old global-singleton
path). The ingester now calls `_sleep` directly, so those patches no longer
controlled the ingester's sleep — the worker looped with real asyncio.sleep
and the tests hung indefinitely.

Also: four API lifespan tests had no tarpit_watcher_worker patch, letting the
real tarpit task start. And test_api_survives_db_init_failure patched
`decnet.web.api.asyncio.sleep` (the singleton) instead of the existing
`_retry_sleep` alias.

Fixes:
- patch("decnet.web.ingester._sleep", ...) in the three ingester tests
- add tarpit_watcher_worker patch to all four api lifespan tests
- patch("decnet.web.api._retry_sleep", ...) in db_init_failure test
2026-05-10 22:10:54 -04:00
ff51ce55e2 fix(tests): eliminate tarpit OOM from global asyncio.sleep mock
Two interacting bugs caused asyncio.sleep to be mocked globally,
letting tarpit_watcher_worker spin the event loop on a non-async
mock and accumulate _increment_mock_call records without bound:

1. test_ingester.py patched `decnet.web.ingester.asyncio.sleep` via
   the asyncio singleton — any code in the process using asyncio.sleep
   (including the tarpit worker) hit the fake_sleep side_effect.
   Fix: add `_sleep = asyncio.sleep` alias in ingester.py and patch
   `decnet.web.ingester._sleep` instead — scopes the mock to ingester.

2. test_api_startup_guards.py called `_run_lifespan_startup` without
   DECNET_CONTRACT_TEST=true, which started the real tarpit task in a
   manually-constructed event loop that the tests never cancelled.
   Fix: set DECNET_CONTRACT_TEST=true inside _run_lifespan_startup so
   the lifespan skips all background workers.
2026-05-10 10:06:21 -04:00
a2c34cac02 fix(tests): prevent xdist worker OOM from leaked tarpit asyncio task
asyncio_default_fixture_loop_scope was 'module', so all async tests in
a module share one event loop. test_lifespan_startup_and_shutdown patched
log_ingestion_worker/log_collector_worker/attacker_profile_worker but not
tarpit_watcher_worker — the real while-True coroutine was created as an
asyncio task on the shared loop and never cancelled. The xdist worker ran
for 4+ hours (confirmed via py-spy + etime=04:48) consuming 15+ GB before
OOM-kill.

Fixes:
- Patch tarpit_watcher_worker in both TestLifespan tests
- Change asyncio_default_fixture_loop_scope to 'function' so each test
  gets its own loop; tasks cannot outlive their test
- Add loop_scope='module' to precision_engine which legitimately needs
  a module-scoped event loop
2026-05-10 09:53:25 -04:00
9a7b03700c refactor(intel): migrate AttackerIntel JSON-string columns to native SQLAlchemy JSON
Five list columns (greynoise_tags, abuseipdb_categories, threatfox_threat_types,
threatfox_ioc_types, threatfox_malware_families) and four dict columns
(*_raw) are now Column(JSON) with list/dict type annotations and
default_factory=list/dict. Providers return native Python objects; the
application-layer json.dumps/json.loads round-trip and _decode_json_list
helpers are gone. to_intel_event_payload() reads columns directly.

Also caps pytest xdist at -n 4 and excludes tests/api from norecursedirs
to prevent schemathesis workers from OOM-killing the dev loop.
2026-05-10 09:17:15 -04:00
de3634d739 feat(ttp): enable 6 xfail tests — evidence shape + tracing spans
- test_evidence_shape.py: replace broken (command, BehavioralLifter)
  pairing with correct (http_fingerprint, HttpFingerprintLifter) case;
  expand _LIFTER_CASES to 5-tuples with per-lifter payloads and rule
  factories; wire StubRuleStore + _index.install() per lifter; remove
  xfail marker — all 4 parametrized cases now pass

- factory.py: add _span() helper gated on _telemetry._ENABLED; wrap
  each per-lifter dispatch in _tag_one() that opens a
  ttp.lifter.{name} child span per call

- http_fingerprint_lifter.py: add missing name = "http_fingerprint"

- test_tracing.py: replace pytest.fail() stubs in
  test_lifter_child_spans_emitted and test_no_pii_canary_in_span_attributes
  with real test bodies; remove xfail markers
2026-05-10 08:51:07 -04:00
c39b63a431 test(ttp): enable test_dropped_intel_enriched_still_produces_intel_tags
Removes the E.3.14b xfail marker and writes the test body:
- _StubRepo gains get_attacker_intel_row_by_uuid(uuid) backed by an
  optional intel_rows dict; existing tests pass None (no catch-up, no
  change to their behaviour).
- The test drives a session.ended event with NO intel.enriched published,
  injects an AttackerIntel row into the stub repo, and asserts the
  tagger is called with source_kind='intel' carrying the correct payload
  fields (abuseipdb_score, greynoise_classification).
- Pins the asymmetry contract: email.received has no catch-up path
  (sibling test already green); intel does.
2026-05-10 08:30:44 -04:00
6e7020f2aa feat(ttp): implement E.3.14b intel catch-up via attacker.session.ended
On every attacker.session.ended event, the TTP worker now reads the
persisted AttackerIntel row (if any) and synthesizes an intel-source
TaggerEvent so intel-derived tags emit even when attacker.intel.enriched
was dropped or arrived before the worker started.

Key changes:
- AttackerIntel.to_intel_event_payload() — single source of truth for
  the intel-row → lifter payload projection; shared by future callers
  without importing decnet.intel.* (no-SPOF contract preserved).
- BaseRepository.get_attacker_intel_row_by_uuid() — returns the live
  SQLModel instance so the catch-up path can call to_intel_event_payload().
- _build_intel_catchup_event() in ttp/worker.py — looks up the intel row,
  builds the TaggerEvent, returns None on absent row (silence, not error).
- _process_event() extended: appends the catch-up event to tagger_events
  when topic contains "session.ended". Deterministic source_id keeps
  compute_tag_uuid idempotent across replays; INSERT OR IGNORE deduplicates
  against any prior attacker.intel.enriched path.

DummyRepo stub + coverage call added per feedback_run_base_repo_test.md.
2026-05-10 08:27:22 -04:00
471b33df1b feat(ttp): enable test_abuseipdb_score_30_dropped — impl was already done
Replace pytest.fail() stub with actual test body: constructs IntelLifter
with R0054, feeds score=30 payload, asserts confidence=0.21 (0.70×0.30)
which is below CONFIDENCE_FLOOR. xfail marker removed.

Corrects docstring: R0054 T1110 base_conf=0.70, not 0.85 as originally written.
2026-05-10 08:08:29 -04:00
39518e33b4 feat(ttp): implement evidence-shape validation and confidence range constraint
- TolerantTagger.tag validates evidence keys against EVIDENCE_SCHEMA TypedDicts;
  TypeError (programmer error) propagates instead of being swallowed
- IntelEvidence and EmailEvidence expanded from stubs to full per-provider
  key sets (total=False); IntelEvidence old stub fields replaced wholesale
- EVIDENCE_SCHEMA map added to models/ttp.py and imported by base.py
- TTPTag __table_args__ gains confidence [0,1] CheckConstraint (DB-enforced)
- xfail removed from test_confidence_outside_range_rejected_at_insert and
  test_evidence_shape_violation_propagates_as_typeerror — both now pass
- TypeError removed from _SWALLOWED_EXCS fuzz list; test_intel_evidence_keys
  updated to assert the real provider key set
2026-05-10 07:56:52 -04:00
a8f6a28f3a fix(test): pre-import decnet.cli at collection time to prevent agent-mode stripping
import decnet.cli as _decnet_cli at module level guarantees the app singleton is
built in master mode before any test can set DECNET_MODE=agent. Without this,
test_defence_in_depth_direct_call_fails_in_agent_mode triggered a fresh import
of decnet.cli with DECNET_MODE=agent active, which stripped master-only commands
and wrote the stripped module to sys.modules[decnet].cli — a parent-attribute
corruption that no sys.modules dict restore can fix.
2026-05-10 07:32:43 -04:00
8f6f56f481 fix(test): restore decnet.cli in sys.modules via monkeypatch to prevent agent-mode app stripping from leaking into subsequent tests 2026-05-10 07:18:50 -04:00
6fecf45dcd fix(orchestrator/tests): attribute access on TopologySummary, not dict
emailgen/scheduler.py: topology.email_personas/.language_default
test_heartbeat_topology_resync.py: row.needs_resync (5 occurrences)
2026-05-10 07:11:14 -04:00
4c8ef2f104 fix(orchestrator): _topology_personas accepts TopologySummary or dict 2026-05-10 07:08:39 -04:00
64610bf96e fix(tests): sync 4 tests to current production contracts
- SSH schema: add user + user_password fields (service extended post-test)
- TopologySummary: repo.get_topology() returns model now, not raw dict
- health live: tarpit_watcher added to get_background_tasks(), add to expected set
2026-05-10 06:48:42 -04:00
e4626879f6 perf(pytest): 194s → 4s collection — lazy heavy imports + norecursedirs
Four-part fix for the collection bottleneck that was blocking the dev loop:

1. Lazy mitreattack.stix20 import in attack_stix.py — deferred to first
   _load() call (TYPE_CHECKING guard at top level)

2. Lazy misp_stix_converter import in both MISP export routers — moved
   from module level into the route handler body

3. Lazy attack_catalog / attack_stix in ttp.py repo mixin — thin wrapper
   functions so the import chain never fires at module load time

4. tests/api/conftest.py — `from decnet.web.api import app` moved inside
   the `client()` fixture; `pytest_ignore_collect` broadened to skip all
   test_schemathesis*.py variants (not just test_schemathesis.py), which
   were launching a subprocess server at module-import time

5. pyproject.toml — `norecursedirs` for tests/live, tests/stress,
   tests/service_testing, tests/docker, tests/perf so these directories
   are never entered; `-m` filter removed from addopts (now redundant);
   `--dist loadscope` → `--dist load` to unblock workers immediately

6. behave_core / behave_shell rename — BEHAVE packages dropped the
   `decnet_` prefix; reinstalled editable installs and updated all 14
   import sites across profiler, ttp, bus, and correlation modules
2026-05-10 06:41:25 -04:00
f63aca4186 fix(test): reset _cached_backend before factory dispatch tests 2026-05-10 05:47:26 -04:00
95593cb804 fix(test): access DeckyRow.uuid as attribute, not dict key 2026-05-10 05:36:07 -04:00
16e032b7a5 fix(test): access LANRow.id as attribute, not dict key 2026-05-10 05:26:49 -04:00
967aec56d2 fix(bundle): prune node_modules during agent tarball walk 2026-05-10 05:17:32 -04:00
d3899dde96 fix(test): scrub DECNET_CORS_ORIGINS before domain-sections ini test 2026-05-10 05:17:00 -04:00
c2693aafc3 fix(clustering): filter extra fp keys before splatting into update_identity_fingerprints 2026-05-10 04:51:49 -04:00
92f43b4655 fix(fleet): update BASE_IMAGE test to allow digest-pinned image refs 2026-05-10 04:51:18 -04:00
f11def0af1 fix(collector): strip port from remote_addr before attacker identity resolution
host:port in remote_addr was creating a distinct Attacker row per TCP
connection instead of per IP. Split on the last ':' in parse_rfc5424;
preserve the port as fields['remote_port'] so repeated source ports are
retained as fingerprint signal in bounty payloads.
2026-05-10 04:06:42 -04:00
6a6f5807aa fix(pr3): adapt to quic-go v0.59.0 API — drop H3App, capture h3 SETTINGS via http3.Settingser
quic-go v0.59.0 (shipped with Caddy v2.11.2) removed quic.Connection as
a public interface and quic-go/logging as a public package, breaking
H3App's connection-wrapping approach.

Resolution:
- Remove H3App (h3app.go) entirely; Caddy handles h3 natively when h3
  is in the protocols list.
- Rewrite h3conn.go to keep only tryParseH3ControlStream + varint/name
  utilities (tested, useful for future stream-level tapping if the API
  ever re-exposes it).
- FPHandler.ServeHTTP: for h3 requests, type-assert ResponseWriter to
  http3.Settingser (the public interface exposed by quic-go/http3 v0.59),
  read the peer's Settings after ReceivedSettings channel closes, emit
  h3_settings fp record.
- https/entrypoint.sh: include h3 in CADDY_PROTOCOLS (Caddy now owns
  UDP/443); remove DECNET_H3_GLOBAL block.
- Update go.mod/go.sum to caddy v2.11.2 + quic-go v0.59.0.
- Update test_https_compose_h3_app.py to expect h3 in protocols when
  http/3 is selected, and assert decnet_h3 block is absent.
- All Go tests (9) and Python tests (15) remain green.
2026-05-10 03:43:34 -04:00
5675dd8ebc feat(pr3): canonical wire-order header capture for h1/h2 + H3App for SETTINGS
- Renames caddy.listeners.decnet_h2fp → decnet_fp; adds h1 raw-byte
  header capture (plainTappingConn) and h2 continuous HPACK decode loop
  (parseH2HeadersLoop) so headers_ordered reflects actual wire order, not
  Go map iteration order.
- Adds H3App Caddy module (decnet_h3) that owns UDP/443 via quic-go,
  wraps accepted QUIC connections with h3SettingsTappingConn to intercept
  the h3 control stream and extract RFC 9114 SETTINGS in wire order.
- Wires access_log emission from FPHandler.ServeHTTP via responseCapture.
- Updates syslog_bridge.py (canonical + per-service copies) with inline
  _compute_ja4h and new fp socket record branches: http_request_headers,
  h3_settings, access_log.
- Fixes ingester proto field alias (bridge emits 'proto', ingester expected
  'protocol') and exposes _process_fingerprint_bounties test alias.
- Go tests: h1/h2/h3 golden-byte tests all green; h3_tracer_test covers
  varint parser, GREASE detection, truncated-stream safety.
- Python tests: 15/15 green across bridge JA4H hash parity, ingester
  compat (old + new event shapes), and Caddyfile h3 template assertions.
2026-05-10 03:29:00 -04:00
8d1f26c0c7 fix(https): move Flask backend to 8443 to avoid netns conflict with http service on 8080 2026-05-10 02:31:08 -04:00
44ab42d80c fix(server): add from __future__ import annotations for Python <3.9 compat 2026-05-10 02:23:13 -04:00
d09b891a55 fix(syslog_bridge): add fp socket reader to canonical template — sync was overwriting per-service copies 2026-05-10 02:17:56 -04:00
42b5d97a50 fix(syslog_bridge): rewrite both templates with from __future__ annotations, fp socket imports, and start_fp_socket_reader 2026-05-10 02:06:53 -04:00
1669f25733 fix(syslog_bridge): add from __future__ import annotations for Python <3.9 compat 2026-05-10 01:58:43 -04:00
255ccebf29 fix(entrypoint): fail-fast if Flask does not bind within timeout instead of silently starting Caddy with no backend 2026-05-10 01:51:09 -04:00
d4f391bab1 fix(caddy): remove explicit tls from listener_wrappers — Caddy applies it by default 2026-05-10 01:45:03 -04:00
38cf1e6c6d fix(caddy+syslog): add UnmarshalCaddyfile to H2FP/FP handlers; add start_fp_socket_reader to syslog_bridge 2026-05-10 01:39:04 -04:00
6618b3c2a1 fix(topology): publish UDP/443 on gateway base when https service has http/3 enabled 2026-05-10 01:33:01 -04:00
7b54944fcc fix(https): remove ports from compose fragment — MACVLAN makes port publishing incompatible with network_mode 2026-05-10 01:29:46 -04:00
46963cbeec fix(deployer): chown synced _caddy_modules back to source owner after root copy 2026-05-10 01:26:13 -04:00
f2b0d286b3 fix(caddy): correct caddyhttp import path to modules/caddyhttp 2026-05-10 01:22:00 -04:00
f1ac1b4004 fix(deploy): sync _caddy_modules into http/https build contexts before compose up 2026-05-10 01:11:44 -04:00
3154224f68 fix(docker): hoist ARG BASE_IMAGE before first FROM so it scopes to all stages 2026-05-10 01:05:00 -04:00
724380901f fix(wizard): emit per-decky service config sections instead of prefix group
[decky.https] relied on ini_loader prefix-matching to propagate config
to decky-03/04/05 — silent and fragile. Now emits [decky-03.https],
[decky-04.https], [decky-05.https] explicitly so the INI is self-evident
and doesn't depend on pattern matching side-effects.
2026-05-10 01:00:43 -04:00
52a52eee78 fix(network): reload network before checking Containers on IPAM drift
networks.list() returns bare objects — Containers is always empty
without a reload(). The active-endpoint guard from the prior commit
never fired because it was checking a stale empty dict.
2026-05-10 00:56:56 -04:00
251181255b fix(network): reuse existing decnet_lan when active deckies are connected
Docker refuses network removal (403) when containers hold endpoints.
The old IPAM-drift path tried to disconnect+remove even with live
containers — disconnect silently failed, remove raised APIError.

Since DECNET assigns IPs explicitly in compose (never via Docker's
auto-assign pool), an ip_range mismatch on an existing same-driver
network is harmless. Bail out early and attach to the existing network
whenever Containers is non-empty.
2026-05-10 00:50:41 -04:00
92632d7afd feat(pr2): HTTP/2+HTTP/3 fingerprint extractors — JA4H, H2 SETTINGS, JA4-QUIC 2026-05-10 00:47:19 -04:00
0653e500b5 feat(services): HTTP/2 + HTTP/3 support via Caddy reverse-proxy
Swap Werkzeug for Caddy as the protocol layer for http and https decoy
services. Flask keeps owning app logic (fake_app, custom_body, headers,
syslog) on 127.0.0.1:8080; Caddy terminates h1/h2/h2c/h3 on the wire
with real-world TLS/QUIC fingerprints.

- Add `multi_enum` FieldType to ServiceConfigField + _coerce
- Add `http_versions` field to HTTPService (h1/h2c) and HTTPSService
  (h1/h2/h3); selecting h3 emits UDP/443 port mapping in compose
- Rewrite both Dockerfiles with multi-stage Caddy binary copy +
  setcap for port binding as the logrelay user
- Entrypoints parse HTTP_VERSIONS JSON, render a Caddyfile, start
  Flask in background, wait for it, then exec Caddy
- https/server.py drops direct TLS handling; Caddy owns the cert
- Add ProxyFix to both server.py so Flask sees real attacker IPs
- Frontend: multi_enum checkbox-group renderer in ServiceConfigFields;
  FormValue union extended to string[]; compactPayload skips []
- Fix stale test_smtp_relay_schema_matches_smtp: relay schema is a
  superset of smtp, not equal; update assertions accordingly
2026-05-10 00:04:37 -04:00
ec5b49144e fix(ui): transparent input bg fallback so light-mode text is legible 2026-05-09 23:24:37 -04:00
8dde954559 feat(ui): restyle LLMTab with DeckyFleet/PersonaGeneration form vocabulary 2026-05-09 23:23:25 -04:00
d1478f900c fix(ui): remove unused _SENTINEL from LLMTab 2026-05-09 23:21:29 -04:00
39eb1ce5db refactor(ui): move LLM provider config into Config tab, remove standalone route 2026-05-09 23:20:11 -04:00
c66749209f feat(ui): LLMConfig panel + route (/realism-llm) + nav entry 2026-05-09 23:15:27 -04:00
41b8e9b7b3 feat(realism/llm): GET/PUT /api/v1/realism/llm + worker hot-reload tick 2026-05-09 23:12:29 -04:00
155ab59ee8 feat(realism/llm): DB-backed LLMConfig, factory DB-first dispatch, Ollama HTTP mode 2026-05-09 23:09:36 -04:00
f10201e885 feat(secrets): Fernet encrypt/decrypt helper for DB-stored operator secrets 2026-05-09 23:07:24 -04:00
4c6b12dcf8 feat(stix_export): wire fingerprint bounties through all endpoints + tests
Remaining files from the fingerprint-bounties + characterizes-SRO commit:
misp_export, repository, bounties mixin, all 4 router endpoints, and test suite
updates. Prerequisite: previous commit added _extract_fingerprint_bounty_data
and the stix_export changes.
2026-05-09 09:14:48 -04:00
51d0fc7b6c feat(stix_export): HTTP quirks + JARM in protocol_fingerprints; characterizes SRO
Wire fingerprint bounties (JARM hashes, HTTP header quirks) from the bounties
table into the DecnetActorFingerprintExt.protocol_fingerprints group so the
sniffer/profiler-captured HTTP fingerprinting data surfaces in every STIX export.

Add a stix2.Relationship(relationship_type="characterizes") SRO linking each
x-decnet-behave-profile SDO back to its ThreatActor so graph-traversal tools
can follow the edge without relying on the bare x_decnet_behave_profile_ref
custom string property alone.

New repo surface:
- get_fingerprint_bounties_by_ip(ip) -> list[dict]
- get_all_fingerprint_bounties_for_export() -> dict[str, list[dict]]

All 4 export endpoints (per-attacker + fleet, STIX + MISP) extended with the
new gather slot. 50/50 tests green, mypy clean.
2026-05-09 09:14:29 -04:00
ef13e1fe4e test 2026-05-09 09:12:09 -04:00
97c99a4e03 feat(ttp): rich ThreatActor STIX extensions via CustomExtension + CustomObject
- stix_custom.py: DecnetActorFingerprintExt (@CustomExtension) wrapping
  network_behavior (os_guess/hop_distance/tcp_fingerprint/timing_stats/
  phase_sequence/behavior_class/beacon fields/tool_guesses) and
  protocol_fingerprints (ja3_hashes/hassh_hashes/kex_order_raw/
  ssh_client_banners/tls_cert_sha256/payload_simhashes/c2_endpoints).
  XDecnetBehaveProfile (@CustomObject x-decnet-behave-profile) carrying
  full BEHAVE-SHELL observation envelopes + kd_digraph_simhash.
  FINGERPRINT_EXT_DEF singleton extension-definition SDO.
- Drop legacy flat x_decnet_ja3_hashes / x_decnet_hassh_hashes /
  x_decnet_c2_endpoints (pre-v1, no consumers).
- stix_export: _threat_actor() wired to behavior + observations;
  build_attacker_bundle/build_fleet_bundle grow observations parameter.
- Repo: list_observations_by_attacker + get_all_observations_for_export
  abstract + sqlmodel impl; all four export endpoints extended.
- 18 new tests; inter-DECNET round-trip (stix2.parse → typed objects)
  is the primary fidelity assertion.
2026-05-09 08:52:19 -04:00
1200ac9132 feat(stix): STIX→MISP download export (per-attacker + fleet)
Adds GET /api/v1/attackers/{uuid}/export/misp and
GET /api/v1/attackers/export/misp backed by misp_export.py, which
converts existing STIX bundles to MISP events via misp-stix
ExternalSTIX2toMISPParser. Fleet endpoint emits {response:[...]}
collection (one event per attacker). Frontend: STIX/MISP buttons on
AttackerDetail header and Attackers list. 13 new tests green.
2026-05-09 08:04:25 -04:00
8990d9321d fix(ttp/stix): add Sighting SRO per process execution to link commands to threat-actor 2026-05-09 07:47:44 -04:00
d6a091be75 fix(ttp/stix): extract commands from both 'command' and 'command_text' keys 2026-05-09 07:43:44 -04:00
e548be3c49 feat(web): wire EXPORT button to fleet STIX endpoint 2026-05-09 07:40:07 -04:00
c210a56fc8 feat(ttp/stix): fleet-wide STIX 2.1 export — GET /api/v1/attackers/export/stix 2026-05-09 07:37:41 -04:00
f827197cc8 feat(ttp/stix): add deduped process SCOs for attacker commands 2026-05-09 07:33:30 -04:00
1ee7a4a481 fix(ttp/stix_export): _aware() handles ISO string timestamps from DB 2026-05-09 07:26:48 -04:00
915bc6d7ef feat(web): add Download STIX button to AttackerHeader 2026-05-09 07:24:59 -04:00
fe0ed4a251 feat(ttp): STIX 2.1 bundle export for individual attackers
GET /api/v1/attackers/{uuid}/export/stix returns a self-contained STIX
2.1 bundle: ip observation, threat-actor, ATT&CK attack-patterns with
canonical MITRE IDs, uses relationships, per-tag sightings, file SCOs
for artifacts, domain-name SCOs for SMTP targets, and a provider intel
note. Attack-pattern SDOs carry the MITRE bundle IDs so consumers
deduplicating against the public ATT&CK bundle get exact matches.
2026-05-09 07:21:22 -04:00
c4d6eb5bb3 feat(web): mitre_url deeplinks + lazy groups subpanel in TTPInspector
Every technique_id in TechniqueBar and TTPInspector now links to its
canonical attack.mitre.org page. The inspector drawer gains a GROUPS
subpanel that lazy-fetches the new /ttp/techniques/{id}/groups endpoint
and renders each MITRE-tracked intrusion-set with deeplink and aliases.

Centralizes TTP row interfaces into src/types/ttp.ts and API wrappers
into src/utils/ttpApi.ts to give the new GroupRef type a clean home and
avoid a third inline fetch declaration.
2026-05-09 06:57:10 -04:00
1d3086a5c7 feat(web): GET /api/v1/ttp/techniques/{id}/groups — MITRE-tracked groups using a technique
Surfaces the intrusion-set reverse index from the loaded ATT&CK
bundle: given a technique, returns the list of groups MITRE has
documented as using it. Read-only — explicitly NOT an attribution
claim about a DECNET attacker. The frontend pulls this lazily when
the operator expands a technique panel; payload-size cost on every
TTPTagDetailRow makes embedding wasteful for techniques with 50+
documented groups.

- decnet/web/router/ttp/api_get_groups_for_technique.py exposes
  GET /api/v1/ttp/techniques/{technique_id}/groups, response_model
  list[GroupRef]. Same JWT-viewer auth gating as the rest of the
  TTP router. 404 when the technique_id doesn't resolve in the
  bundle.
- Sub-techniques are queried directly (no auto-union with parent)
  to match ATT&CK Navigator semantics; callers that want a broader
  view query the parent themselves.
- tests/ttp/test_groups_for_technique.py covers happy path, 404,
  sub-technique attribution independence, empty-list-on-zero-groups,
  and that responses include mitre_url + aliases.
- tests/web/test_api_attackers.py: fix pre-existing fixture drift
  introduced by a2a61b63 — three TestGetAttackerDetail cases were
  missing AsyncMock for repo.latest_observation_per_primitive,
  causing TypeError on await of MagicMock. The new groups endpoint
  doesn't share code with attacker_detail; this is a drive-by fix
  surfaced by the same suite run.
2026-05-09 06:45:25 -04:00
84a075e405 feat(ttp): promote mitre_url to first-class TTPTag column + propagate everywhere
Phase 2 attached mitre_url to intel-emitted tags' evidence JSON;
Phase 3 promotes it to a real column populated for *every* tag —
intel, credential, behavioral, canary, identity, email, rule-engine —
from one source. Pre-v1, so the SQLModel field is added directly
without an Alembic migration.

- TTPTag gains mitre_url: Optional[str] (not indexed — derived
  deeplink, not a query target; technique_id is already indexed).
- _emit.py and rule_engine._evaluate_rules both populate mitre_url
  via attack_stix.mitre_url_for(sub_technique_id or technique_id).
  Sub-technique URL when present, else parent. The two construction
  sites stay separate because the rule_engine path carries per-emit
  span instrumentation that emit_tags() can't preserve without
  threading a span object through; minimal-change beats forced
  refactor here.
- intel_lifter strips mitre_url from evidence_extra in all four
  decision functions. The column is canonical now; duplicating in
  the JSON column would drift when the bundle moves. The unused
  TechniqueEmission import + tracking dicts removed too.
- IdentityTechniqueRow / TechniqueRollupRow / TTPTagDetailRow /
  CampaignTechniqueRow gain mitre_url: Optional[str].
- sqlmodel_repo/ttp.py:_mitre_url_for added; the 5 row-builder sites
  pass mitre_url=_mitre_url_for(sub_technique_id or technique_id)
  alongside the existing technique_name resolution.
- api_get_tag_details.py needs no change — list_tags_by_scope_and
  _technique already returns model_dump() rows that flow the new
  column through **row spread to TTPTagDetailRow.
- tests/ttp/test_emit_attaches_mitre_url.py covers both construction
  paths (top-level, sub-tech, unknown, multi-emit) and a regression
  test that intel_lifter evidence dicts no longer contain mitre_url.
2026-05-09 06:40:08 -04:00
9675f4bf92 refactor(decnet_web/MazeNET): bump coverage floor after Inspector split
Suite is now 51 files / 259 tests, 25.68% lines / 21.43% branches.
Floor: lines 24->25, functions 21->22, branches 19->21,
statements 23->24. Inspector/index.tsx ends at 172 LOC, the only
other > 250 LOC file in MazeNET/ is NodeInspector (362) — the
node branch was the bulk of the original 606 LOC and its 7
add-service / tarpit form states stay co-located there.
2026-05-09 06:34:02 -04:00
4fbce6a8b0 refactor(decnet_web/MazeNET): split Inspector by selection type
Inspector.tsx (606 LOC) splits into Inspector/{NetInspector,
NodeInspector, EdgeInspector, ServiceInspector, index}.tsx plus
types.ts. The dispatcher (index.tsx) owns the title bar, the empty
state, the activeNetIds derivation, the pending-diff block, and the
topology-status block; each per-type panel takes only the props it
needs. NodeInspector keeps the 7 useStates for the add-service /
tarpit forms since they are node-only.

10 new dispatcher-level tests cover empty / node / net / edge /
service / observed-entity / internet-net / live-ops gating /
tarpit-controls / pending-diff. Selection type re-exported from
Inspector/index.tsx so MazeNET.tsx, Canvas.tsx, and
useMazeContextMenu.tsx keep their existing import path.
2026-05-09 06:33:12 -04:00
e50474cb66 feat(ttp): add mitre_url_for + groups_using_technique helpers
Two reusable bundle-derived lookups that the next two commits build
on:

- mitre_url_for(tid) returns the canonical attack.mitre.org URL by
  reading external_references on the cached attack-pattern. Backed
  by the existing lru-cached _attack_pattern_by_id so per-call cost
  is constant. Handles top-level techniques and sub-techniques
  (T1059.004 -> .../techniques/T1059/004).
- GroupRef + groups_using_technique(tid) surface the intrusion-set
  reverse index from the loaded bundle: given a technique, return
  the MITRE-tracked groups documented as using it. Sorted by
  group_id for deterministic responses; lru-cached. Sub-technique
  semantics match ATT&CK Navigator (do NOT auto-union with parent).
- decnet/ttp/data/intel_loader._mitre_url_for collapses to a thin
  re-export of attack_stix.mitre_url_for; the loader keeps mitre_url
  on TechniqueEmission for the eventual STIX export.
- tests/ttp/test_attack_url.py covers both helpers: top-level + sub
  URLs, unknown -> None / (), GroupRef immutability + hashability,
  deterministic ordering, sub-technique distinct from parent.
2026-05-09 06:32:04 -04:00
0a9a2f9021 refactor(decnet_web/AttackerDetail): trim shell + bump coverage floor
Drop unused icon/api/useEffect/Tag imports left behind by the
fingerprint, behaviour, and IntelPanel extractions. AttackerDetail.tsx
ends at 450 LOC across Phase 10 (down from 1652 / 73% reduction).
Coverage floor: lines 23->24, functions 20->21, branches 17->19,
statements 22->23.
2026-05-09 06:29:15 -04:00
4bd502d3bf refactor(decnet_web/AttackerDetail): lift IntelPanel
Move IntelPanel + IntelRow type + ProviderRow + VERDICT_TONE/fmtTs
helpers into AttackerDetail/IntelPanel/. AttackerDetail.tsx drops
from 680 to 449 LOC. New IntelPanel.test.tsx covers the loading,
absent (404), error (500), and ok states with MSW handlers.
2026-05-09 06:27:59 -04:00
e92d415304 refactor(decnet_web/AttackerDetail): lift behaviour panel block
Move BehaviouralPrimitivesPanel + 8 sub-components (BehaviorHeadline,
BeaconBlock, DetectedToolsBlock, TcpStackBlock, TimingStatsBlock,
PhaseSequenceBlock, AttributionBadge, KeyValueRow, StatBlock) plus
the OS_LABELS / BEHAVIOR_LABELS / TOOL_LABELS / BEHAVIOUR_DOMAIN_*
lookup tables and fmtOpt/fmtSecs into AttackerDetail/behaviour/.
AttackerDetail.tsx drops from 1220 to 680 LOC; existing
behaviour_panel test moves to behaviour/BehaviouralPrimitivesPanel.test.tsx
and now imports from the canonical location. The shell still
re-exports BehaviouralPrimitivesPanel for source compatibility.
2026-05-09 06:25:53 -04:00
1f3f58c42c refactor(decnet_web/AttackerDetail): lift fingerprint renderers + tests
Move 12 Fp* components, FingerprintGroup, getPayload, seqClassColor,
HashRow, fpType lookups, and UA color tables into
AttackerDetail/fingerprints/. AttackerDetail.tsx drops from 1652
to 1220 LOC; the orchestrator now imports the same helpers it used
to define inline. 10 new tests covering UA / HTTP-quirks / resumption
/ certificate / spoofed-source / TCP-stack / dispatch fallback.
2026-05-09 06:21:44 -04:00
d25f69ba1b feat(ttp): extract intel_lifter provider mappings to YAML data + ATT&CK external_reference enrichment
The four provider→technique tables (AbuseIPDB cat→techniques,
GreyNoise tag→techniques, ThreatFox threat_type→techniques, plus
the Feodo binary-listed signal) used to live as Final[dict] constants
in intel_lifter.py. Two real problems with that:

1. Drift between rules/ttp/R0054.yaml..R0058.yaml (which declare
   the full slate per provider) and the Python dicts (which decide
   which slate-member fires per signal). The v2 audit comment in
   intel_lifter.py documented that they had silently drifted.
2. No ATT&CK provenance on emissions — the loaded STIX bundle has
   rich external_references (canonical attack.mitre.org URLs) that
   never surfaced because the lifter had no path back to them.

Mappings now live as YAML at decnet/ttp/data/intel/{provider}.yaml,
validated at load against the loaded ATT&CK bundle, with each entry
enriched by attack_stix._attack_pattern_by_id to attach the canonical
MITRE URL to every emission.

- decnet/ttp/data/intel_loader.py: pydantic-validated schema +
  ProviderMapping/Signal/TechniqueEmission frozen dataclasses +
  load_provider_mapping(provider) lru-cached.
- Per-technique high_score_threshold inlined into YAML
  (collapses the separate _ABUSEIPDB_HIGH_SCORE_GATED dict).
- external_reference field follows the STIX 2.1 external-reference
  shape (source_name + url + optional external_id) so the future
  STIX/MISP exporter is a direct translation.
- intel_lifter.py: dicts deleted, decision functions read from
  ProviderMapping accessors. Decision-flow constants (T1071/T1595
  bare-classification fallbacks in _greynoise_decisions) stay in
  code — they're not table rows.
- Each emit slot's evidence_extra now carries mitre_url for any
  technique resolved in the bundle (every one in practice).
- tests/ttp/test_intel_mappings.py: snapshot equivalence vs the
  legacy dicts, high-score gate behavior, every-signal-has-an-
  external-reference, every-emission-has-a-mitre-url, negative
  paths (unknown technique_id raises AttackBundleError, mismatched
  provider field rejected, dir listing matches expected providers).

The YAML schema + mitre_url enrichment lays groundwork for the
future STIX exporter; this commit does NOT build that exporter.
2026-05-09 06:18:25 -04:00
a3f1cea2d6 feat(ttp): fetch + verify MITRE ATT&CK LICENSE alongside the bundle
MITRE's ATT&CK Terms of Use require reproducing their copyright +
license alongside any cached copy of ATT&CK data. Today we ship the
bundle but not the license — this commit closes that compliance gap.

- attack_version.py pins ATTACK_LICENSE_URL +
  ATTACK_LICENSE_SHA256 + ATTACK_LICENSE_FILENAME, sourced from the
  same attack-stix-data repo as the bundle.
- attack_stix.py:_fetch_license downloads LICENSE.txt next to the
  bundle. License sha mismatch is logged + refreshed (license text
  gets occasional formatting tweaks; not a security event), unlike
  the bundle which stays fail-closed.
- _ensure_license is the compliance ratchet: resolve_bundle_path
  refuses to return without LICENSE.txt on disk. Override-mode
  (DECNET_ATTACK_BUNDLE) checks for a sibling LICENSE.txt first,
  then DECNET_ATTACK_LICENSE, then the cache dir.
- python -m decnet.ttp.attack_stix license prints the cached license
  to stdout for operator audit.
- loaded_license_path() exposes the active license path read-only.
- tests/ttp/test_attack_license.py covers happy paths (sibling +
  explicit env), refusal when DECNET_ATTACK_LICENSE points at a
  missing file, the CLI subcommand, and the pinned-sha shape.
2026-05-09 06:17:46 -04:00
b326d70852 refactor(decnet_web/Credentials): wire shell + bump coverage floor
Credentials.tsx: 487 -> 231 LOC. Page now composes CredsTable +
ReuseTable + useCredentials hook; URL-derived state (tab, query,
service, page) and selection/sort UI are the only concerns left
in the shell.
2026-05-09 06:12:58 -04:00
bf79581cc9 refactor(decnet_web/Credentials): extract CredsTable + ReuseTable + SortTh 2026-05-09 06:11:38 -04:00
e29a0094c9 refactor(decnet_web/Credentials): add useCredentials data hook 2026-05-09 06:10:53 -04:00
275fac5288 refactor(decnet_web/Credentials): extract types + helpers with tests 2026-05-09 06:09:57 -04:00
2c1ccec8fa refactor(decnet_web/SwarmHosts): wire shell + bump coverage floor
SwarmHosts.tsx: 513 -> 161 LOC. Page now composes EnrollmentWizard
+ useSwarmHosts hook; only the arm/confirm UI affordance and the
busy-set tracking remain in the shell.
2026-05-09 06:08:48 -04:00
780d395a46 refactor(decnet_web/SwarmHosts): add useSwarmHosts polled data hook 2026-05-09 06:07:38 -04:00
9def7fd22f refactor(decnet_web/SwarmHosts): extract EnrollmentWizard 2026-05-09 06:06:29 -04:00
3a8519b2a1 refactor(decnet_web/SwarmHosts): extract types + helpers with tests 2026-05-09 06:05:19 -04:00
31f4c54c32 refactor(decnet_web/Webhooks): wire shell + bump coverage floor
Webhooks.tsx: 642 -> 387 LOC. Page now composes FormRow + SecretModal
+ useWebhooks hook; toast policy is the only UI concern left in the
shell. Multi-select delete uses the hook's reload internally.
2026-05-09 06:04:14 -04:00
7408a04a90 refactor(decnet_web/Webhooks): add useWebhooks data hook 2026-05-09 06:02:37 -04:00
1ac64d2ae2 refactor(decnet_web/Webhooks): extract FormRow + SecretModal 2026-05-09 06:01:47 -04:00
432057f44a feat(ttp): fail-closed validation that lifter+UKC IDs resolve in ATT&CK bundle
Drift between the technique/tactic IDs hardcoded in the lifters and
what the loaded ATT&CK STIX bundle actually contains is silent in the
status quo: a renamed-or-retired technique just stops being tagged.
Every emission point now has an explicit validator that asserts its
IDs resolve in the loaded bundle, called once at TTP-worker boot.

- intel_lifter.all_emitted_technique_ids() collects every technique
  the four provider tables (AbuseIPDB / GreyNoise / Feodo / ThreatFox)
  plus the decision-flow constants in _greynoise_decisions and
  _feodo_decisions can emit. validate_against_attack_bundle() runs it
  through attack_stix.assert_known_technique_ids().
- ukc.validate_against_attack_bundle() asserts every key in
  ATTACK_TACTIC_TO_UKC resolves, with TA0100..TA0106 documented as
  _NON_ENTERPRISE_TACTICS (lives in the ICS bundle, not the
  enterprise bundle DECNET loads).
- decnet/ttp/worker.py:run_ttp_worker_loop calls both validators
  before subscribing to the bus. A bundle-vs-code mismatch refuses
  to start the worker rather than silently mistagging events.
- tests/ttp/test_attack_bundle_validation.py covers the happy path
  for both validators, the negative path (injected bogus tactic ID
  raises AttackBundleError), the ICS exemption, and the lone T1078
  reference in credential_lifter.
2026-05-09 05:58:06 -04:00
d743d38cac feat(ttp): load MITRE ATT&CK from official STIX 2.1 bundle
Replace the hand-maintained TECHNIQUE_NAMES dict (pinned to v15.1) with
a runtime loader that reads the official enterprise-attack-N.json STIX
bundle. Version bumps now require only updating attack_version.py;
sub-technique parents, tactic IDs, and kill-chain phases all come from
MITRE's published data.

- decnet/ttp/attack_version.py pins version 19.0 + sha256 + URL
- decnet/ttp/attack_stix.py is the lazy STIX loader. Resolution order:
  DECNET_ATTACK_BUNDLE env -> ~/.cache/decnet/attack/ -> fetch from
  the pinned MITRE GitHub URL. SHA-256 verified before parse;
  mismatch fails closed.
- decnet/ttp/attack_catalog.py collapses to a shim re-exporting
  technique_name() so the ~9 router/repo call sites don't churn.
- python -m decnet.ttp.attack_stix fetch warms the cache and can
  print sha256 for version-bump workflows.
- test_attack_catalog.py now asserts every rule-emitted ID resolves
  in the loaded bundle (same contract, real source) and exercises
  the SHA-256-mismatch fail-closed path.
2026-05-09 05:54:36 -04:00
44f4dd8c85 refactor(decnet_web/Webhooks): extract types + helpers with tests 2026-05-09 05:49:34 -04:00
ac64329a13 refactor(decnet_web/PersonaGeneration): wire shell + bump coverage floor
PersonaGeneration.tsx: 875 -> 357 LOC. Page now composes the data
hook + PersonaCard + PersonaEditor; bulk-import helpers stay in
helpers.ts; toast policy is the only UI concern left in the shell.
2026-05-09 05:48:16 -04:00
c1a65bf9a3 refactor(decnet_web/PersonaGeneration): add usePersonaGeneration data hook 2026-05-09 05:46:08 -04:00
97e72d975b refactor(decnet_web/PersonaGeneration): extract PersonaCard + PersonaEditor 2026-05-09 05:45:10 -04:00
a19d8bba17 refactor(decnet_web/PersonaGeneration): extract types + helpers with tests 2026-05-09 05:43:59 -04:00
6e0e1c204e refactor(decnet_web/MazeNET): wire hooks + bump coverage floor
Final integration step. The MazeNET page shell is now a thinner
composition of the existing module-level hooks (useMazeApi,
useMazeInteraction, useTopologyEditor, useTopologyStream,
useLayoutPersistor) PLUS the three new ones from this phase
(useFullscreenMode, useTopologyData, useMazeContextMenu).

- MazeNET.tsx: 980 -> 715 LOC. The fullscreen + body-class
  effects, the topology hydrate / SSE stream / deploy /
  flashErr plumbing, and the four context-menu builders are
  all gone from the shell.
- Page still owns the per-operation editor callbacks
  (removeNet/Node/Edge, duplicateNode, addServiceToNode, etc.)
  because they need direct access to setNodes/setEdges/setNets
  for optimistic patches alongside their REST calls — those
  setters are exposed by useTopologyData for that reason.

Coverage floor bumped after the phase:

  lines       17 -> 19
  functions   15 -> 17
  branches    13 -> 14
  statements  16 -> 18

Phase 5 final scoreboard: 37 test files, 172 tests, all green.
2026-05-09 05:39:32 -04:00
f33a011900 refactor(decnet_web/MazeNET): extract useMazeContextMenu
Lift the context-menu builder out of the page shell. The hook
owns ctxMenu open/close state and exposes one builder per
surface (node / net / edge / canvas); the actual operations come
in via callbacks so the page keeps its optimistic-patch logic
unchanged.

- New MazeNET/useMazeContextMenu.tsx
- useMazeContextMenu.test.ts covers menu lifecycle (open/close),
  node-menu items, observed-entity locking, internet-net
  delete-disabled, canvas-menu Add subnet/DMZ items, and the
  edge-menu Remove invocation.
- Wiring into MazeNET.tsx lands next.
2026-05-09 05:34:58 -04:00
5f2a3f4629 refactor(decnet_web/MazeNET): extract useTopologyData
Lift the canvas data plane off the page shell. The hook owns:

  GET /topologies/:id            (hydrates nets/nodes/edges + meta)
  GET services + archetypes      (catalogs, with bundled fallback)
  POST /topologies/:id/deploy
  /topologies/:id/events SSE     (open only when active/degraded)
  flashErr() banner timer        (auto-clears actionErr after 4s)

State setters for nets / nodes / edges are returned so the
per-operation callbacks living in the page can optimistically
patch local state alongside their REST calls (matches the
existing pattern; wholesale lift would mean dragging every
mutation along too).

- New MazeNET/useTopologyData.ts
- useTopologyData.test.ts covers hydrate, loadErr surfacing,
  streamEnabled gating on active/degraded, onDeploy success +
  error paths, and the flashErr 4s auto-clear with fake timers.
- Wiring into MazeNET.tsx lands in the next commit.
2026-05-09 05:33:19 -04:00
212feb49e2 refactor(decnet_web/MazeNET): extract useFullscreenMode
Lift the four fullscreen-related side-effects off the page shell.
The hook owns:

  1. body class toggle so page CSS can hide its chrome
  2. browser fullscreen API request/exit (failures ignored)
  3. fullscreenchange listener so F11/Esc from outside our button
     keeps internal state in sync
  4. Esc keystroke handler

Returns { fullscreen, setFullscreen, toggle }.

- New MazeNET/useFullscreenMode.ts
- useFullscreenMode.test.ts (jsdom) covers initial toggle, body
  class lifecycle, Esc-to-exit, and unmount cleanup.
- MazeNET.tsx loses ~30 LOC of inline state + effects.
2026-05-09 05:31:39 -04:00
171e20e427 refactor(decnet_web/Config): wire hook + bump coverage floor
Final integration. The page shell is now a thin composition of
useConfig + the previously-extracted children:

- Config.tsx: 989 -> 131 LOC. Page owns only the activeTab state
  (and the "drop the users tab if the server didn't send users"
  effect). Every form lives inside its tab; toast wiring lives
  in AppearanceTab; window.alert calls live inside UsersTab.
- Tabs receive their `onSave* / onAddUser / ...` callbacks
  directly from the hook — no intermediate wrapper handlers.

Coverage floor bumped after the split:

  lines       14 -> 17
  functions   13 -> 15
  branches    11 -> 13
  statements  13 -> 16

Phase 4 final scoreboard: 34 test files, 156 tests, all green.
2026-05-09 05:27:47 -04:00
4a9cd90f90 refactor(decnet_web/Config): extract AppearanceTab
APPEARANCE panel — accent-color picker — into its own tab. State
is local since no other tab cares about the value; localStorage
persistence + the document.documentElement[data-accent] mirror
move along with it.

- New Config/tabs/AppearanceTab.tsx
- AppearanceTab.test.tsx covers the matrix default, reading the
  saved accent from localStorage on mount, and the click-to-flip
  flow writing both localStorage and the html data-accent attr.
2026-05-09 05:26:26 -04:00
ccae1612bd refactor(decnet_web/Config): extract GlobalsTab + DangerZone
GLOBAL VALUES panel + the developer-mode-gated DANGER ZONE
(reinit) into one tab file. Two stacked panels because they're
the two pieces of UX you ever see together on the globals tab;
splitting them into separate components would force the page
shell to re-pick the gating predicate.

- New Config/tabs/GlobalsTab.tsx (mutation-interval + DangerZone
  inline, since DangerZone is reinit-specific and won't be reused)
- GlobalsTab.test.tsx covers interval-format validation, the
  DANGER ZONE gating on developer_mode, the two-step reinit
  confirm flow, the totals chip ("PURGED: N logs, N bounties,
  N attacker profiles") on success, and viewer-mode rendering.
2026-05-09 05:25:49 -04:00
be35228191 refactor(decnet_web/Config): extract UsersTab
USER MANAGEMENT panel into its own tab. Owns the per-row UI
state (delete-confirm, reset-password popup) plus the add-user
form state; mutations come in via prop. Errors on per-row
operations stay on window.alert (matches existing behavior); the
add form uses the inline FormMsg chip.

- New Config/tabs/UsersTab.tsx
- UsersTab.test.tsx covers row rendering with the must-change
  badge, the two-step delete confirm flow, the add-user submit
  payload (trimmed username + selected role), and the success
  chip after a successful add.
2026-05-09 05:24:55 -04:00
8807da218b refactor(decnet_web/Config): extract LimitsTab
DEPLOYMENT LIMITS panel into its own tab file. Owns the input
state, preset-button shortcuts, and the inline FormMsg chip; the
hook mutation is passed in via prop so this component is fully
reusable as a presentation-only piece.

- New Config/tabs/LimitsTab.tsx
- LimitsTab.test.tsx covers viewer-vs-admin rendering, the
  1-500 validation message, and success/error chip display.
2026-05-09 05:23:42 -04:00
f2fd314dd6 refactor(decnet_web/Config): extract useConfig data hook
Lift the GET /config fetch and every admin-side mutation off the
page shell:

  GET    /config
  PUT    /config/deployment-limit
  PUT    /config/global-mutation-interval
  POST   /config/users
  DELETE /config/users/:uuid
  PUT    /config/users/:uuid/role
  PUT    /config/users/:uuid/reset-password
  DELETE /config/reinit (returns { logs, bounties, attackers })

Mutations return { ok: true } | { ok: false; reason: string } so
the upcoming tab components can render the inline FormMsg chip
without touching axios error shapes. reinit additionally returns
the deletion totals so the danger-zone confirmation can echo
"PURGED: N logs, N bounties, N attackers".

- New Config/useConfig.ts
- useConfig.test.ts MSW-covers initial load, isAdmin role
  surfacing, setDeploymentLimit ok + 400 paths, addUser, deleteUser
  refused, and reinit success.
- Wiring into Config.tsx + tab extractions land in follow-up commits.
2026-05-09 05:23:04 -04:00
b1fbf4630e refactor(decnet_web/Config): move WorkersPanel out
Verbatim move of the worker-status pollster (~390 LOC) plus its
RealismBadge sidekick into its own file. Owns its own polling +
stop/start/start-all mutations; toast push comes in via prop so
the parent stays the one source of toast tone.

- New Config/WorkersPanel.tsx
- WorkersPanel.test.tsx (MSW) covers worker-row rendering, the
  BUS OFFLINE banner, and the error panel on /workers 500.
- Config.tsx loses the inline WorkersPanel + RealismBadge plus
  the now-unused icon imports (Square, RefreshCw, Play).
2026-05-09 05:22:10 -04:00
209efd1a74 refactor(decnet_web/Config): extract types
Foundation for the Config split. UserEntry / ConfigData move out
of the page so the upcoming hook + tab extractions can import
without reaching back through Config.tsx. New ConfigTab union and
FormMsg type for the inline success/error chip pattern that
repeats across every admin form on the page.

- New Config/types.ts (UserEntry, ConfigData, ConfigTab, FormMsg)
- Config.tsx loses the inline interfaces and the `as any` cast on
  setActiveTab in the tab-switcher.
2026-05-09 05:19:50 -04:00
6ba12cc571 refactor(decnet_web/CanaryTokens): wire hook + bump coverage floor
Final integration step. The page shell is now a thin composition
of useCanaryTokens + the previously-extracted children:

- CanaryTokens.tsx: 1,334 -> 210 LOC. Page owns only the
  pure-UI state (tab, search/state/scope filters, modal
  visibility, drawer selection, local fileDrops log) and the
  thin handlers that translate hook results into confirm/alert
  prompts. Initial parallel fetch + deleteBlob mutation moved
  to useCanaryTokens in the prior commit.
- Modals plug directly into the hook's optimistic helpers
  (prependToken / prependBlob / markTokenRevoked) so the page
  doesn't reach into the data shape.

Coverage floor bumped after the split:

  lines       11 -> 14
  functions   10 -> 13
  branches     8 -> 11
  statements  11 -> 13

Phase 3 final scoreboard: 28 test files, 131 tests, all green.
2026-05-09 05:17:52 -04:00
c5cbe084cb refactor(decnet_web/CanaryTokens): extract list views
Lift the three tab bodies — tokens, blobs, file drops — into
their own files. Each takes plain props (data + the operations
its rows need), so the page shell stops mixing tab markup with
data plumbing.

- New CanaryTokens/TokenListView.tsx (text search + state/scope
  filter selectors + flat row grid; visibleTokens memo lives here
  now). Exports StateFilter / ScopeFilter union types so the page
  can declare its filter useState with the right shape.
- New CanaryTokens/BlobListView.tsx (delete refused while a token
  references a blob; ref count badge reuses the disabled button).
- New CanaryTokens/FileDropListView.tsx (CLEAR LIST hidden when
  the local log is empty).
- Three companion tests cover empty states, filter behavior,
  delete refused-vs-allowed, and the per-tab callback wiring.

Wiring into CanaryTokens.tsx + the hook lands next.
2026-05-09 05:16:18 -04:00
0c8c74a89d refactor(decnet_web/CanaryTokens): extract useCanaryTokens hook
Lift the parallel initial-load fetch and the deleteBlob mutation
off the page shell. Modal-driven optimistic merges (created
token, uploaded blob, drawer-revoked token) flow through narrow
setter helpers so the modals don't have to know how state is
shaped internally.

  GET    /canary/tokens
  GET    /canary/blobs (silent 403 -> empty list, viewer-friendly)
  GET    /deckies
  GET    /topologies/?status=active
  DELETE /canary/blobs/:uuid

deleteBlob returns { ok, reason } so the page can branch the
toast/alert tone without seeing the axios error type. Wiring
into CanaryTokens.tsx lands in the next commit.

- New CanaryTokens/useCanaryTokens.ts
- useCanaryTokens.test.ts MSW-covers happy load, viewer 403 ->
  empty blobs, deleteBlob ok + refused-with-detail paths, and the
  markTokenRevoked optimistic write.
2026-05-09 05:14:48 -04:00
69f547f75e refactor(decnet_web/CanaryTokens): move FileDropModal + LS helpers
Verbatim move of the file-drop modal (~310 LOC) and its
localStorage glue (FILEDROP_LS_KEY, FileDropEntry type,
loadFileDrops, saveFileDrops) into one file. The list view that
shows these entries lives in the page; the persistence layer
travels with the writer.

- New CanaryTokens/FileDropModal.tsx (modal + LS helpers + entry type)
- FileDropModal.test.tsx covers loadFileDrops empty / round-trip /
  200-row cap / malformed-JSON, plus modal title rendering, the
  bypass-warning banner, and CANCEL -> onClose.
- CanaryTokens.tsx loses the inline modal + LS glue plus the
  now-unused imports (useRef/X/AlertTriangle/useEscapeKey/
  useFocusTrap, plus BTN_PRIMARY/BTN_GHOST/Field that only the
  modals consumed).
2026-05-09 05:13:51 -04:00
b664655dcb refactor(decnet_web/CanaryTokens): move UploadModal out
Verbatim move of the artifact upload modal (~130 LOC) into its
own file. Drop-or-browse picker, server-side-injection warning
banner, and the multipart POST stay unchanged.

- New CanaryTokens/UploadModal.tsx
- UploadModal.test.tsx covers title rendering, empty drop-zone
  hint, server-injection warning banner, UPLOAD-disabled-until-
  file, and CANCEL -> onClose.
2026-05-09 05:11:46 -04:00
e30455551d refactor(decnet_web/CanaryTokens): move CreateTokenModal out
Verbatim move of the canary-token creation modal (~280 LOC) into
its own file. Renamed from CreateModal to CreateTokenModal so the
component name carries scope across the package boundary.

- New CanaryTokens/CreateTokenModal.tsx
- CreateTokenModal.test.tsx covers title rendering, CANCEL ->
  onClose, empty-deckies hint, and the Operator-upload mode
  switch revealing the no-blobs message. useFocusTrap is
  vi.mock'd to avoid jsdom focus shenanigans.
- CanaryTokens.tsx loses the inline modal + its now-unused
  imports (KNOWN_GENERATORS, KIND_OPTIONS, GeneratorName).
2026-05-09 05:10:27 -04:00
a35048b174 refactor(decnet_web/CanaryTokens): extract types + helpers + ui
Foundation for the CanaryTokens split. Types, error/format helpers,
and the inline style + small primitives move out of the page so
the upcoming modal/list extractions can import without reaching
back through CanaryTokens.tsx.

- New CanaryTokens/types.ts (BlobRow, DeckyOption, TopologyOption,
  Scope, KNOWN_GENERATORS / GeneratorName, KIND_OPTIONS, STATE_COLOR)
- New CanaryTokens/helpers.ts (extractError, fmt, fmtBytes)
- New CanaryTokens/ui.tsx (INPUT_STYLE, BTN_PRIMARY, BTN_GHOST,
  Field, Stat)
- CanaryTokens.tsx loses ~110 LOC of inline definitions; behavior
  unchanged.
2026-05-09 05:08:37 -04:00
08c274486e test(decnet_web): raise coverage floor after DeckyFleet split
Phase 2 lands. DeckyFleet.tsx dropped from 1,674 to 274 LOC; the
fleet page is now a thin composition of useDeckyFleet + 6
extracted children (DeckyInspectPanel, IntervalEditor, DeckyCard,
DeployWizard, DeckyFilters, DeckyGridEmpty), each with co-located
tests.

Lock the gain by bumping the threshold floor in vite.config.ts:

  lines       7  -> 11
  functions   6  -> 10
  branches    5  -> 8
  statements  7  -> 11

Phase 2 final scoreboard: 21 test files, 98 tests, all green.
2026-05-09 05:06:08 -04:00
9da6f6983e refactor(decnet_web/DeckyFleet): wire hook + extract filter UI
Final integration step. The page shell is now a thin composition
of the hook + the previously-extracted children:

- DeckyFleet.tsx: 1,674 -> 274 LOC. Page owns only the
  pure-UI state (filter, search, armed-confirm, modal visibility,
  selected-card-for-inspect) and the toast-wrapping handlers that
  translate hook results into toast tone. Polling, REST plumbing,
  role lookup, and archetype catalog all moved to useDeckyFleet
  in the prior commit.
- New DeckyFilters.tsx (header pill row + DEPLOY shortcut) +
  DeckyGridEmpty.tsx (fleet-empty vs. filter-empty copy).
- DeckyFilters.test.tsx + DeckyGridEmpty.test.tsx cover count
  rendering, filter-click callbacks, and admin-gated DEPLOY
  visibility.

Two-step teardown arming logic stays in the page (it's pure UI).
Toast tone branching on { ok, reason } from useDeckyFleet
results moves the policy decision out of the data layer.
2026-05-09 05:05:31 -04:00
9ddeb1a08c refactor(decnet_web/DeckyFleet): extract useDeckyFleet data hook
Lift every read- and write-side data flow off the page shell:

  GET  /system/deployment-mode  (decides which list endpoint to hit)
  GET  /deckies | /swarm/deckies (mode-switched + shape-normalized)
  GET  /config (role -> isAdmin)
  GET  /topologies/archetypes (live catalog with bundled fallback)
  POST /deckies/:name/mutate
  PUT  /deckies/:name/mutate-interval
  POST /swarm/hosts/:uuid/teardown
  10s polling loop refreshing mode + list

Operations return discriminated results ({ok:true} | {ok:false,
reason:...}) so the page can branch toast tone without seeing the
axios error type. Toasts, arm-confirm, and modal visibility stay
in the consuming page — the hook is pure data.

- New DeckyFleet/useDeckyFleet.ts
- useDeckyFleet.test.ts MSW-covers initial load, swarm-mode shape
  normalization, mutate ok/error paths, teardown ok path, and
  applyServicesChange optimistic write.
- DeckyFleet.tsx wiring lands in the next commit so the diff stays
  reviewable.
2026-05-09 05:03:31 -04:00
1e2bc41ab1 refactor(decnet_web/DeckyFleet): move DeployWizard out
Lift the multi-step deploy wizard (~520 LOC) plus its private
INI-builder helpers (PLACEHOLDER_LINES, b64encodeUtf8, buildIni,
PickMode type) into their own file. Verbatim move; the
underscore-prefixed helpers drop the leading underscore now that
they're file-local rather than competing with hoisted parent
constants.

- New DeckyFleet/DeployWizard.tsx
- DeployWizard.test.tsx covers the closed render guard, the
  open-at-step-0 archetype list, NEXT-disabled-until-archetype,
  and CANCEL -> onClose. ServiceConfigFields is vi.mock'd to a
  stub since it pulls schemas via api.get() that are out of
  scope for these tests.
- DeckyFleet.tsx loses the wizard plus the now-unused imports
  (DEFAULT_SERVICES, Modal, PickIcon, ServiceConfigFields and
  its type aliases).
2026-05-09 05:01:33 -04:00
849caffaf1 refactor(decnet_web/DeckyFleet): move DeckyCard out
Lift the per-decky tile (~430 LOC) into its own file. Tarpit
controls, live add/remove service flow, and the per-service config
toggle stay inside the card — those are tile-local UI concerns and
only ever rendered from this component anyway.

- New DeckyFleet/DeckyCard.tsx
- DeckyCard.test.tsx covers identity row + services rendering,
  admin-gated FORCE MUTATE visibility, the FORCE MUTATE callback,
  TEARDOWN -> CONFIRM toggle when armed matches, and card-body
  click firing onInspect. AddServiceConfigModal +
  ServiceConfigForm are vi.mock'd so we don't need MSW handlers
  for their unrelated network fetches.
- DeckyFleet.tsx loses the inline component plus the now-unused
  imports it dragged in (Network/PowerOff/RefreshCw/Plus/X icons,
  ServiceConfigForm, AddServiceConfigModal, useCallback).
2026-05-09 04:58:25 -04:00
b6ff288dcf refactor(decnet_web/DeckyFleet): move IntervalEditor out
Verbatim move of the per-decky mutation-interval modal (~60 LOC)
into its own file. Saves null when the toggle is off, minutes
otherwise.

- New DeckyFleet/IntervalEditor.tsx
- IntervalEditor.test.tsx covers null-current disabled path,
  numeric-current enabled path, and CANCEL not firing onSave.
- src/test/fixtures/decky.ts now derives DeckyFixture from the
  canonical Decky type (the fixture's loose swarm shape was
  missing host_address/host_status; aligning to Decky catches
  that statically).
2026-05-09 04:55:30 -04:00
032ffbb4eb refactor(decnet_web/DeckyFleet): move DeckyInspectPanel out
Lift the right-side inspect drawer (~115 LOC) into its own file.
This is a verbatim move — same JSX, same useEscapeKey + body
overflow lock, same swarm-section gating. Underscore-prefixed
helper calls (_dotFor, _stateColor) drop the leading underscore
since they're now imported from helpers.tsx.

- New DeckyFleet/DeckyInspectPanel.tsx
- DeckyInspectPanel.test.tsx covers identity-row rendering, the
  SERVICES chip list, the conditional SWARM block, and the close
  button callback.
- DeckyFleet.tsx loses the panel + the now-unused useEscapeKey
  import.
2026-05-09 04:54:10 -04:00
8c168c64a8 refactor(decnet_web/DeckyFleet): extract types + helpers
Foundation for the DeckyFleet split. Types and helpers move to
their own files so the upcoming subcomponent extractions can
import without reaching back through the parent module.

- New DeckyFleet/types.ts (Decky, SwarmDeckyRaw, SwarmMeta,
  Archetype, FilterKey, DeckyStatus). Names exported to match the
  pattern set by AttackerDetail/types.ts.
- New DeckyFleet/helpers.tsx (archetypeIcon, PickIcon, dotFor,
  hitsFor, stateColor). Underscore-prefixed call sites stay via
  import-rename so this commit changes zero behavior.
- DeckyFleet.tsx loses ~110 LOC of inline definitions plus the
  now-unused icon imports (Cpu / Database / Globe / Monitor /
  Shield / Terminal).
2026-05-09 04:52:48 -04:00
6d7c0b6419 test(decnet_web): raise coverage floor after AttackerDetail split
Phase 1 of the UI refactor is in. AttackerDetail dropped from
2,579 LOC inline data + JSX to a 408-LOC shell composed of
extracted sections, each with co-located tests. Lock the gain by
bumping the threshold floor in vite.config.ts:

  lines       0 -> 7
  functions   0 -> 6
  branches    0 -> 5
  statements  0 -> 7

Future PRs raise these; never lower. Phase 1 final scoreboard:
9 test files, 45 tests, all green.
2026-05-09 04:49:32 -04:00
d5efebd73d refactor(decnet_web/AttackerDetail): extract MailLogPanel section
Lift STORED MAIL into its own section and pull the mail drawer
selection state along with it. Section signals admin-gating
through the section's own props (mailForbidden), since the data
hook already converts a 403 into that boolean.

- New AttackerDetail/sections/MailLogPanel.tsx
- MailLogPanel.test.tsx covers row rendering, mailForbidden empty
  state, no-mail empty state, from_hdr/from_addr/mail_from
  fallback, and drawer open/close. MailDrawer vi.mock'd same as
  ArtifactDrawer.
- AttackerDetail.tsx loses the mail JSX block, mailItem state,
  and now-unused Mail/MailDrawer imports.
2026-05-09 04:48:44 -04:00
14713eb294 refactor(decnet_web/AttackerDetail): extract ArtifactsPanel section
Lift CAPTURED ARTIFACTS into its own section, taking the drawer
selection state with it (the parent shell no longer owns
artifact-modal state).

- New AttackerDetail/sections/ArtifactsPanel.tsx
  Drawer is rendered as a sibling of the section so its z-index
  and focus-trap behavior mirror the original.
- ArtifactsPanel.test.tsx covers row rendering with parsed SD
  fields, empty state, missing stored_as (no OPEN button), and
  the open/close cycle. ArtifactDrawer is vi.mock'd to a stub
  so we don't need MSW handlers for its content fetch.
- AttackerDetail.tsx loses the artifact JSX block, the artifact
  state, and now-unused Paperclip/Package/ArtifactDrawer imports.
2026-05-09 04:47:17 -04:00
9cee4b2e71 refactor(decnet_web/AttackerDetail): extract CommandsViewer section
Lift the COMMANDS collapsible — paginated table with header-bar
prev/next controls — into its own section. The page math
(cmdTotalPages = ceil(total/limit)) and conditional empty state
both live in the section now.

- New AttackerDetail/sections/CommandsViewer.tsx
- CommandsViewer.test.tsx covers title formatting (unfiltered vs.
  filtered), empty state, single-page pagination hiding, and
  prev/next button behavior
- AttackerDetail.tsx loses the IIFE-wrapped commands JSX block
  plus now-unused ChevronLeft/ChevronRight/Terminal imports
2026-05-09 04:45:41 -04:00
7b21f31078 refactor(decnet_web/AttackerDetail): extract ServicesTargeted section
Lift the SERVICES TARGETED collapsible — interactive two-tone badge
chips with click-to-filter — into its own section. The selection
state was already lifted into useAttackerDetail in the prior
commits, so the section just consumes serviceFilter /
setServiceFilter as props.

- New AttackerDetail/sections/ServicesTargeted.tsx
- ServicesTargeted.test.tsx covers badge rendering, empty state,
  inactive-click-sets-filter, and active-click-clears-filter
- AttackerFixture grows ip_leaks/ip_leaks_total fields so the
  TimelineSection rotation test (added in the prior commit) keeps
  passing under the new factory shape
2026-05-09 04:44:25 -04:00
95e1a4ab7a refactor(decnet_web/AttackerDetail): extract TimelineSection
Lift the TIMELINE collapsible (timestamps, ASN, reverse DNS,
leaked-IPs row with rotation detection) into its own section.
LeakedIPsRow + the rotation/inline-limit constants come along
since they were only ever used here.

Also moves the shared `Section` collapsible primitive into
AttackerDetail/ui.tsx so the remaining sections can adopt the
template without re-importing through the parent module.

- New AttackerDetail/sections/TimelineSection.tsx (LeakedIPsRow
  inline as a private helper)
- AttackerDetail/ui.tsx now exports both Tag and Section
- AttackerDetail.tsx loses LeakedIPsRow, the Section helper, the
  Timeline JSX block, and now-unused imports (ChevronUp, ChevronDown,
  AttackerData)
- TimelineSection.test.tsx covers timestamps, unknown-origin path,
  rotation badge, empty leaks, collapse, and toggle callback
2026-05-09 04:43:13 -04:00
f524d283b7 refactor(decnet_web/AttackerDetail): extract AttackerStats section
Lift the 5-up counter grid + the conditional scan-vs-interact row
into AttackerStats. The activity row's visibility predicate
collapses into a single boolean inside the section so the parent
no longer encodes UX rules.

- New AttackerDetail/sections/AttackerStats.tsx
- AttackerStats.test.tsx covers all-five counters, activity present,
  activity empty, and service_activity undefined paths.
2026-05-09 04:40:34 -04:00
653ae04e88 refactor(decnet_web/AttackerDetail): extract AttackerHeader section
Lift the header (IP, country tag, traversal badge, identity badge)
into its own section component. Tag helper moves to a shared
AttackerDetail/ui.tsx so future sections can reuse it without
re-importing through AttackerDetail.tsx.

- New AttackerDetail/sections/AttackerHeader.tsx (~50 LOC)
- New AttackerDetail/ui.tsx for shared presentational helpers
- AttackerDetail.tsx imports both; local Tag definition deleted
- AttackerHeader.test.tsx covers country present/absent,
  TRAVERSAL badge, IDENTITY click-through, identity null path
2026-05-09 04:39:30 -04:00
22cfb10617 refactor(decnet_web/AttackerDetail): extract data layer into useAttackerDetail
The AttackerDetail page body owned all 7 REST fetches plus 2 SSE
streams inline as 200+ lines of useEffect plumbing. Lift them into
a single hook so section components extracted in follow-up commits
consume typed values, not setState pairs.

- New ./AttackerDetail/types.ts holds the canonical AttackerData,
  BehaviouralObservation, AttributionPrimitiveState plus newly-named
  ArtifactLog / SessionLog / SmtpTargetRow / MailLog / CommandRow
  (previously inline anonymous types).
- New ./AttackerDetail/useAttackerDetail.ts owns:
  * GET /attackers/:id (404 -> ATTACKER NOT FOUND)
  * GET /attackers/:id/attribution (silent-tolerant)
  * GET /attackers/:id/commands paged with 422 alert preserved
  * GET /attackers/:id/{artifacts,smtp-targets,mail,transcripts}
    (mail surfaces a 403 boolean for the admin-gated viewer)
  * useAttackerStream + useIdentityStream subscriptions, including
    the live attribution-state-changed merge.
- AttackerDetail.tsx re-exports BehaviouralObservation /
  AttributionPrimitiveState so AttackerDetail.behaviour_panel.test
  and any future external importer keeps working unchanged.
- New useAttackerDetail.test.ts covers loading -> success, 404,
  paged commands offset, serviceFilter resets cmdPage, and mail 403
  via MSW handlers (the SSE hooks are vi.mock'd; jsdom can't host
  EventSource).

No behavior change for the rendered page; all 37 tests green.
2026-05-09 04:36:35 -04:00
07a7d4918c test(decnet_web): MSW-based test foundation for UI refactor
Phase 0 of the decnet_web refactor: stand up an MSW server, fixtures,
and a router-aware render helper so the upcoming god-component splits
(AttackerDetail first) can land with same-commit test coverage.

- msw devDep + setupServer wired into src/test/setup.ts
- src/test/server.ts re-exports server, http, HttpResponse, apiUrl()
- src/test/fixtures/{attacker,decky,canary,topology}.ts factories
- src/test/renderWithRouter.tsx wraps MemoryRouter + ToastProvider
- baseline coverage thresholds (0%) in vite.config.ts; raise per PR
- coverage/ added to decnet_web/.gitignore

Existing Orchestrator/AttackerDetail/ThemeLab tests stay on vi.mock
and continue to pass; new tests use MSW.
2026-05-09 04:30:51 -04:00
3318b15044 fix(decnet_web/Layout): theme toggle icon stays visible on hover
The global button:hover rule in index.css forces color: var(--bg)
+ matrix-glow on the lucide icon's currentColor stroke, making
the sun/moon icon disappear into the toggle button's tinted
background on hover. Pin color: var(--accent) and box-shadow:
none on .theme-toggle-btn:hover so the icon stays in its base
colour and the button doesn't pick up the wider button-hover
halo.
2026-05-09 04:18:48 -04:00
5a34b1846c fix(decnet_web/Layout): kill residual theme-swap open flash
Even with fill: 'both', the new pseudo paints once at its default
style (no clip-path = full size) before the JS animation
registers — the brief open flash that survived the previous fix.

Pre-publish click coords as --reveal-x / --reveal-y on <html>
before calling startViewTransition. The static CSS rule on
::view-transition-new(root) now sets clip-path: circle(0px at
var(--reveal-x) var(--reveal-y)) as the pseudo's default, so
the very first paint is already fully clipped. The animation
then grows the circle outward from there.
2026-05-09 04:17:50 -04:00
ccff1467b1 fix(decnet_web/Layout): outward theme reveal, no flash either end
ANTI prefers the new theme growing outward from the click point
(visually clearer cause-and-effect than the old theme burning
away). The original outward implementation flashed at the start
because the new pseudo defaulted to its computed style (no
clip-path = fully visible) for one frame before the JS animation
registered.

Switching the animation's fill from 'forwards' to 'both' enforces
the start keyframe (circle(0) at click point) before the first
paint, in addition to pinning the end keyframe through pseudo
teardown. New layer is invisible until the animation begins,
fully visible until cleanup. No flash either end.
2026-05-09 04:17:07 -04:00
6d1fc3a081 fix(decnet_web/Layout): theme swap end-of-animation flash
Without fill: 'forwards' the clip-path keyframes release at
animation end and the pseudo reverts to its computed style
(no clip-path), so the old layer flashes back at full size for
a frame before View Transitions tears the pseudo-elements down.
Pinning the final keyframe with fill-forwards keeps the old
layer fully clipped through to teardown.
2026-05-09 04:15:44 -04:00
a81ea3f973 fix(decnet_web/Layout): theme swap animation no longer flashes opposite mode
Growing the NEW theme layer from circle(0) outward leaves a
one-frame gap where the new pseudo is fully opaque at full size
(the default state) before the clip-path animation registers.
Result: a flash of the destination theme right before the
reveal starts.

Inverted the layering and animation direction:
 - NEW theme snapshot sits on the bottom (z-index 0), static
 - OLD theme snapshot sits on top (z-index 1), shrinks via
   clip-path from circle(N) at click point down to circle(0)

The new layer is now hidden behind the old one until the old
shrinks away — no flash possible because the new layer was
never visible before the animation. Same 520ms duration, same
ease curve, same direction-of-travel from the user's POV
(circle expanding from cursor).
2026-05-09 04:14:54 -04:00
438a6e3e45 feat(decnet_web/Layout): topbar dark/light toggle with circular reveal
User-facing theme toggle ships now that the design system has
been audited end-to-end. A Sun/Moon button lives between the
threat indicator and the SYSTEM status pill in the topbar — same
slim 28x28 voice as the rest of the topbar controls, no chrome
shouting at the user.

Click coords drive a View Transitions API circle clip-path that
grows from the cursor to the farthest viewport corner over 520ms
with the project's standard --ease curve. Browsers without
startViewTransition (older Firefox, Safari < 18) fall through to
an unanimated swap — the hook returns instantly in that case.

Persistence is two-tier:
 - localStorage decnet_theme — the user's saved preference, the
   thing the topbar toggle writes. Survives reloads, applies
   everywhere.
 - sessionStorage decnet_theme_lab — dev-mode lab override (Task
   3). Tab-scoped, wins on boot so devs can A/B without nuking
   the saved preference.

App.tsx hydrates both on first mount in the right order so the
correct theme is on <html> before the first paint.

useThemeToggle is a small hook in lib/ rather than a Layout-only
helper so the same toggle can be reused later from a settings page
or hotkey.
2026-05-09 04:01:24 -04:00
9cab37db3a fix(decnet_web/css): three light-mode dimness fixes
--dim-color and --danger-color were referenced across drawers and
RemoteUpdates but never defined; --dim-color silently inherited
(defeating its purpose) and --danger-color fell back to literal
#f88 salmon (the 'ugly red' WifiOff icon next to UNREACHABLE
hosts). Added both as aliases in :root: --dim-color = var(--fg-3),
--danger-color = var(--alert).

--fg-2/3/4 alphas in light mode were tuned identical to dark
(0.78/0.55/0.35), but ink-on-cream needs more punch than
matrix-on-black at the same alpha — the deploy preview code
block (.code-block .comment / .key) and every dim caption
rendered too faint. Bumped to 0.88/0.70/0.50.

.maze-net-box.inactive applies opacity 0.42 + grayscale(0.7) for
the 'no traffic' signal. On cream that fades the LAN out of
visibility entirely. Override in light mode keeps the dotted
border as the dim-state cue and bumps opacity to 0.85 so the
header text stays legible.
2026-05-09 03:56:06 -04:00
388a968d89 fix(decnet_web/css): sweep violet rgba literals to tokens
Credentials drawer code-block labels (printable:, b64:) and a
dozen other violet wash/tint sites still carried bare rgba(238,
130, 238, *) literals — bright magenta in light mode where
--violet has resolved to charcoal-purple #2d1b4e. Mirrors the
prior matrix/alert/warn/info sweeps: by-alpha buckets land on
var(--violet-tint-10) or var(--violet).
2026-05-09 03:50:29 -04:00
aa0b22aacb fix(decnet_web/css): sweep rgba colour literals to tokens app-wide
Pre-this-commit, ~80 rgba() literals across 24 files were
hardcoding alert-red, warn-amber, info-cyan, panel-dark, and
white-text-with-alpha shades that bypassed the token cascade.
Net effect in light mode: the .eml/SESSREC drawers, AttackerDetail
verdict pills, MazeNET net-box headers, OPEN/REPLAY action
buttons, threat-intel cards, and all the dim 'whitish' overlays
stayed on their dark-mode hex values, producing the unreadable
panels in the screenshots.

Sweep maps each rgba colour family onto the existing token by
alpha bucket — rgba(13,17,23,*) -> var(--panel),
rgba(255,65,65,*) -> var(--alert)/-tint-10,
rgba(255,170,0,*) and rgba(224,160,64,*) -> var(--warn)/-tint-10,
rgba(0,200,255,*) -> var(--info)/-tint-10,
rgba(255,255,255,*) -> var(--fg-N)/var(--matrix-tint-N) by alpha.

VERDICT_TONE in AttackerDetail (MALICIOUS/SUSPICIOUS/BENIGN/
NO SIGNAL) was the worst offender — string literals
'#ff4d4d'/'#ffae42'/'#5fd07a'/rgba(255,255,255,0.4) baked into
inline JS styles. Now resolves at render time via var(--alert)/
var(--warn)/var(--ok)/var(--fg-4).

New tokens in :root:
 - --bg-color (alias of --bg) — drawers used this name with
   #0d1117 fallback that fired in every browser because nothing
   defined --bg-color. Adding the alias makes drawers re-tone.
 - --info / --info-tint-10 / --info-tint-30 — REPLAY buttons and
   any future neutral-secondary use.
 - --ok — semantic alias for 'verified good' (matrix in dark,
   emerald in light) so BENIGN pills stay readable across themes.

Login.css left intentionally — pre-auth surface, not themed.
2026-05-09 03:48:05 -04:00
11b2da7d54 fix(decnet_web/css): light-mode contrast across wizards, code blocks, hovers
Sweeps four invariant violations that were leaking dark surfaces
into light mode and producing the unreadable / inverted areas:

  1. Hardcoded `color: #000` in 14 :hover rules across 11 CSS
     files swapped to `color: var(--bg)` — collapses to #000 in
     dark mode (no-op), becomes cream in light. Fixes DEPLOY
     DECKIES (button hover was rendering charcoal-purple text on
     charcoal-purple background).
  2. Hardcoded `background: #000` (3 sites) and `#0d1117`
     (3 sites) replaced with `var(--bg)` / `var(--panel)`. Fixes
     code blocks and modal panels staying dark on cream — the
     deploy-wizard preview, topology-creation NAME input, and the
     MazeNET canvas backdrop now follow the active theme.
  3. `rgba(0,0,0,0.35)` and `rgba(0,0,0,0.5)` input/card
     backgrounds (ServiceConfigForm, DeckyFleet .input)
     swapped to `var(--panel)`. Fixes per-service config rows
     in the deploy wizard rendering as dark slabs.
  4. SVG arrow markers in MazeNET Canvas.tsx hardcoded
     `fill="#00ff41"` / "#ee82ee" — replaced with currentColor +
     style hook so they re-resolve on theme change.

New behaviour: light-mode hovers tint instead of inverting. The
dark-mode rules fully fill bg with --matrix/--violet/--alert and
flip text to --bg; that lands cream-on-near-ink in light mode
and reads as a jarring colour inversion every cursor move. Light
mode now layers a *-tint-10 background and keeps text in its
base colour. Single override block in index.css targets every
scoped `.X-btn`/`.btn`/`button:hover` via :is() + [class*="-btn"]
so we don't have to chase every component file.
2026-05-09 03:43:47 -04:00
34c778277a refactor(decnet_web/css): promote hardcoded matrix/warn/crit colours to tokens
37 bare rgba(0, 255, 65, ...) literals across 10 component CSS
files were forcing matrix-green to bleed into light mode no matter
what data-theme=light overrode in :root. They're now mapped onto
existing tokens by alpha bucket (0.025-0.05 -> --matrix-tint-5,
0.08-0.10 -> --matrix-tint-10, 0.18-0.30 -> --matrix-tint-30,
0.4 -> --fg-4, 0.5-0.6 -> --fg-3, 0.7-0.8 -> --fg-2).

Adds --warn (#e0a040), --amber (alias of --warn), --crit
(#e74c3c), and their tint-10 variants to :root, with
ink-friendly light-mode overrides. Sweeps bare #ffaa00 / #e0a040
/ #f59e0b / #ff4d4d / #e74c3c usages in the same files onto the
new tokens.

Files with var(--token, #fallback) patterns left alone — those
were already token-driven and the fallbacks just provide safety.
Login.css and inline TSX hex left for the per-page sweep.
2026-05-09 03:36:04 -04:00
df0c8e12e7 fix(decnet_web/css): light theme goes ink-monotone, not green-on-cream
Initial light-theme palette kept --matrix as a darker emerald
and --violet as a darker purple, which washed out badly on
warm cream — auth-helper chips, ACTIVE/PASSIVE/INACTIVE pills,
and CREDS/REUSE tabs all became unreadable because their tint
backgrounds + low-saturation text collapsed to sludge.

Light mode now collapses --matrix and --violet to near-ink
shades (#0d0d0d and #2d1b4e). --alert stays the one
saturated colour — the only element allowed to shout.
Dark mode is untouched; the matrix-vibe identity stays
exclusive to dark.

Also collapses the matrix/violet accent knob in light mode:
data-accent only flavours dark mode now, since two ink
shades are visually identical.
2026-05-09 03:31:18 -04:00
47c57271e7 feat(decnet_web/theme-lab): light theme tokens + dev toggle
Adds html[data-theme="light"] block to index.css overriding the
core six tokens (bg, matrix, violet, panel, border, alert), the
matrix/violet/alert tints, and the foreground opacity ramp to a
cream-on-ink palette anchored on #dbdad6. Glows are no-op'd —
light mode trades neon haloes for hard 1px borders.

Lab page gets a Dark/Light toggle that flips
html.dataset.theme and persists to sessionStorage
(decnet_theme_lab) — intentionally tab-scoped, not user-facing.
App.tsx hydrates the same key on boot so a tab reload keeps the
dev's chosen theme. The user-facing localStorage toggle ships
later via Config.
2026-05-09 03:23:50 -04:00
f3f7bff717 feat(decnet_web/theme-lab): kitchen-sink component zoo
Renders every primitive in the design system on the lab page so
theme-token edits can be evaluated against all states at once:
colour swatches with WCAG contrast vs --bg, the full type scale,
buttons (5 variants × default/hover/disabled), badges and status
pills, info/error banners, metric cards, table rows
(default/hover/selected/drop-target), form inputs, drawer panel
sample, and net-box compose states (internet/inactive/selected/
drop-target — independent classes layering, per memory).

Wrapper uses .fleet-root so global .btn/.btn.violet/etc resolve
identically to real pages. Lab-local CSS owns layout only — every
colour comes from index.css tokens.
2026-05-09 03:22:21 -04:00
846a50dbbf feat(decnet_web/theme-lab): scaffold dev-gated /theme-lab route
Adds VITE_DECNET_DEVELOPER build-time gate: when unset, the
isDeveloperMode() helper collapses to a constant false and Vite
tree-shakes both the lazy import and the conditional <Route> out
of the prod bundle.

ThemeLab is currently a header stub; subsequent tasks fill it
with the design-system primitive zoo plus a Dark/Light toggle
for live token tuning. Route is intentionally absent from
ROUTE_LABELS / sidebar — direct URL only.
2026-05-09 03:18:34 -04:00
65ddaaa681 fix(behave_shell/F.0): tighten prompt detector — log lines ending in '>' no longer vote
_detect_prompt_suffix accepted ANY line ending in $#%> as a PS1 prompt,
so a single `cat /var/log/dpkg.log` (195 lines closing in `<none>`)
flooded environmental.shell_type votes and flipped a plainly-bash
session to fish.

A prompt line now requires either a trailing space after the suffix
(default PS1 shape across bash/zsh/fish/PowerShell) or a PS1-shape
token (user@host, "PS " prefix, or a Windows drive-letter prefix).

Regression tests pin the dpkg.log false-positive and a $-terminated
prose line.
2026-05-09 02:57:40 -04:00
0c1fc68b13 feat(deploy): wire attribution worker — CLI + systemd unit + registry
* decnet attribution — Typer command mirroring decnet reuse-correlate
  (--multi-actor-tick, --daemon flags). Calls run_attribution_loop
  with the dependency-injected repo.
* deploy/decnet-attribution.service.j2 — systemd unit mirroring
  decnet-reuse-correlator.service.j2: ExecStart=decnet attribution,
  same hardening posture (NoNewPrivileges, ProtectSystem=full,
  ProtectHome=read-only, dedicated /var/log/decnet/decnet.attribution.log).
* worker_registry.KNOWN_WORKERS += "attribution" — heartbeat already
  publishes as system.attribution.health from
  attribution_worker._WORKER_NAME, so the Workers panel surfaces the
  row the moment the unit is enabled.
* api_start_all_workers preferred-order list + "attribution" between
  reuse-correlator and enrich so a fresh start-all brings it up
  alongside its peers.

After this commit `systemctl enable --now decnet-attribution` (or
the dashboard's start-all) actually launches the engine.
2026-05-09 02:31:59 -04:00
5253b32319 feat(decnet_web/AttackerDetail): attribution state badges (Phase 6)
Per-primitive state badge rendered next to each value in the
Behavioural Primitives panel. Five-state vocabulary, frozen, mirrors
decnet/correlation/attribution/aggregate.py:

  * STABLE      — green, low-key
  * DRIFTING    — amber, draws the eye
  * CONFLICTED  — red
  * MULTI-ACTOR — purple, loudest (cross-primitive escalation lives
                  in attribution.multi_actor_suspected, not the
                  per-primitive badge)
  * UNKNOWN     — neutral border, no fill

Wiring:

* GET /api/v1/attackers/{id}/attribution on mount + on id change.
  Failures swallowed silently (the worker may be off in dev).
* useAttackerStream gains attribution.state_changed +
  attribution.multi_actor_suspected named events. The state-changed
  handler merges by primitive and locks last_change_ts when the
  state did not actually flip (defensive — backend already gates
  these on transition, but a future relaxation shouldn't lie about
  "stable since X" on the badge tooltip).
* multi_actor_suspected is wired but unused by the badges; the
  per-primitive multi_actor signal already shows on each contributing
  primitive. The handler is in place so a future "two operators
  detected" banner has a live source.

Vitest: 4 new tests (badge renders only for mapped primitives, all
five states render with distinct labels, no badge when prop omitted)
on top of the existing 4. 7 of 7 pass; tsc + vite build clean.
2026-05-09 02:28:11 -04:00
5de4b5e290 feat(decnet_web/AttackerDetail): visual refresh of Behavioural Primitives panel
* Per-domain icons (Keyboard / Cpu / Clock / Activity / Globe / Sparkles).
* Domain headers use BEHAVIOUR_DOMAIN_LABELS with letter-spacing +
  primitive-count badge on the right.
* Bordered domain groups instead of flat list; aligned leaf / value /
  confidence columns with monospace value rendering.
* Section title: BEHAVIOURAL PRIMITIVES -> BEHAVE PRIMITIVES (matches
  the BEHAVE-SHELL extractor naming).
2026-05-09 02:24:37 -04:00
9cc3272a0d test(correlation/attribution): v0 calibration lockdown (Phase 7)
Four synthetic operator-behaviour scenarios at the merger level
(aggregate_observations) that pin v0's calibration:

* Stable HUMAN over 7 sessions   -> all primitives stable
* HUMAN switches to LLM mid-week -> primitives flip stable -> drifting
* Two operators alternating      -> primitives flag multi_actor
                                    (per-primitive; the cross-
                                    primitive multi_actor_suspected
                                    correlator is exercised by Phase 5)
* Single short session           -> all primitives unknown

Plus a threshold-lockdown test that asserts every named constant in
_thresholds.py against its v0 ship value. Anyone adjusting a
threshold without updating the scenarios fails this file.

This closes DEBT-051 at v0 — the attribution engine has a calibrated,
test-locked answer to "is this attacker stable / drifting / showing
multiple operators?" without crossing the persona-attribution bright
line. v1 (cross-attacker clustering, KD simhash linkage signal) is
gated on this v0 surface being stable in production for >= 1 month.
2026-05-09 02:23:10 -04:00
33f7d5a9ff feat(web): expose attribution state on AttackerDetail backend (Phase 6)
GET /api/v1/attackers/{uuid}/attribution

Returns the merger output for an attacker's identity:

    {
        "identity_uuid": "abc..." | null,
        "primitives": [
            {primitive, current_value, state, confidence,
             observation_count, last_change_ts, last_observation_ts},
            ...
        ]
    }

Pre-attribution-worker: identity_uuid=null, primitives=[]. Surfacing
identity_uuid keeps the cross-attacker rollup story visible to the
frontend ahead of v1's clusterer landing.

api_events SSE relay also subscribes to attribution.> and forwards
to the AttackerDetail page filtered on payload.identity_uuid (the
identity is resolved at stream open from the URL's attacker_uuid;
attribution payloads are identity-keyed, not attacker-keyed). New
SSE event names: attribution.state_changed,
attribution.multi_actor_suspected.

Frontend (AttackerDetail.tsx badge rendering, useAttackerStream
consumer) deferred — there's already WIP on AttackerDetail.tsx in
the working tree; merging the badge logic is a separate commit
once that lands.

Tests: 4 endpoint scenarios — 401 unauth, 404 unknown attacker,
200 empty (no stub), 200 with primitive-ordered rows.
2026-05-09 02:21:59 -04:00
e2c7e16793 feat(correlation/attribution): cross-primitive multi-actor detection (Phase 5)
Add tick_multi_actor() — periodic walk of attribution_state firing
attribution.profile.multi_actor_suspected when an identity carries
>= MULTI_ACTOR_MIN_PRIMITIVES rows in multi_actor state.

* Repo's list_multi_actor_identities() already filters to >= 2
  primitives; the correlator just dispatches.
* In-memory dedup keyed on identity_uuid -> frozenset(primitives):
  same set as last fire -> no re-emit. Set grows -> re-emit.
  Set shrinks below threshold -> evict so a future re-flap re-fires.
  Restart-resets are honest because attribution_state persists; a
  v1 multi_actor_suspect_log table can replace this if needed.
* run_attribution_loop() now supervises three concurrent tasks:
  observation handler, multi_actor tick loop, health/control. Tick
  interval comes from _thresholds.MULTI_ACTOR_TICK_SECS (60s) with
  test override.

Tests: 6 scenarios — single-primitive doesn't fire, two-primitive
co-flag fires, dedup blocks unchanged set, set growth re-fires,
threshold drop re-arms, multiple identities fire independently.
2026-05-09 02:18:42 -04:00
dd265d7520 feat(correlation/attribution): wire bus handler, persist state (Phase 4)
attribution_worker.handle_observation_event now executes the full
end-to-end path:

* ensure stub identity (Phase 1)
* observations_for_identity_primitive() — new repo helper joining
  observations through attackers.identity_id, so v1's clusterer
  gets cross-attacker rollup for free
* aggregate_observations() with ValueKind dispatched off the BEHAVE
  PRIMITIVE_REGISTRY; unknown primitives default to categorical
* upsert_attribution_state() — last_change_ts locked when state is
  unchanged so the dashboard can render "stable since X"
* publish attribution.profile.state_changed only on transition;
  idempotent re-runs over the same observation set fire nothing
  (loop-prevention invariant matching ttp.tagged)

Tests:
* 5 end-to-end attribution scenarios over in-memory SQLite + FakeBus.
* test_base_repo's DummyRepo + coverage body now stub every abstract
  surface BaseRepository declares — the 6 added by this branch plus
  the 12 left un-stubbed by earlier work (BEHAVE Phase 1, TTP
  rollups, iter helpers). The coverage test could not previously
  even instantiate.
* test_aggregate_categorical's dispatcher rejection updated for the
  Phase 3 + 4 contract — ValueError on unknown kinds, not
  NotImplementedError.
2026-05-09 02:16:12 -04:00
c39802a4bb feat(correlation/attribution): hash + numeric merge functions (Phase 3)
aggregate_numeric(): EWMA + dispersion (CV) over numeric primitive
values. Stable when CV < 20% AND mean shift < 30%; drifting on >= 30%
mean shift; conflicted on CV > 100%. Confidence is 1 - min(CV, 1).
multi_actor is intentionally NOT a numeric state — bimodal
distributions belong to the categorical detector once the value space
is bucketed.

aggregate_hash(): counts distinct hash values within
HASH_DRIFT_WINDOW_SECS of the most recent observation. 0 rotations =
stable, 1..HASH_DRIFT_MAX = drifting, > HASH_DRIFT_MAX = conflicted.
Reads rotation events; never recomputes hashes (DEBT-032 already
produces them via decnet.correlation.fingerprint_rotation).

aggregate_observations() dispatcher now routes "categorical" |
"numeric" | "hash" | None and rejects unknown kinds with ValueError
(louder than NotImplementedError now that all three v0 mergers
exist). 17 synthetic-input tests cover both new mergers and the
dispatcher.
2026-05-09 01:59:11 -04:00
4956977739 feat(correlation/attribution): categorical merge state machine (Phase 2)
aggregate_categorical(): pure function over a per-(identity, primitive)
observation list. Five-state vocabulary, last-N=5 window comparison
with one-outlier-tolerant majority threshold:

* unknown — < 3 observations
* stable — recent 5 agree (≥ 4 of 5 share top value), older 5 same
* drifting — recent 5 stable but disagrees with older 5, or older
  was conflicted and recent stabilised
* conflicted — recent 5 split, no two-value alternation pattern
* multi_actor — recent 5 split + alternation between exactly two
  values (operator A↔B handoff). Confidence capped at 0.6 per
  _thresholds.MULTI_ACTOR_MAX_CONFIDENCE; flapping primitives on
  flaky networks would otherwise look like two operators.

aggregate_observations() dispatcher honours value_kind="categorical"
(or None) and raises NotImplementedError for "numeric" / "hash" so
Phase 3 lands cleanly. 14 synthetic-input tests cover every state
+ boundary condition.
2026-05-08 23:18:22 -04:00
c2891d6cca feat(correlation/attribution): substrate + idle handler (Phase 1)
v0 Phase 1 of ATTRIBUTION-ENGINE.md:

* AttributionStateRow SQLModel keyed on (identity_uuid, primitive)
  per ANTI direction — re-keying state rows when the v1 clusterer
  merges attackers is the migration debt v0 should not bake in.
  ATTRIBUTION-ENGINE.md updated with the deviation note.
* AttributionMixin: ensure_stub_identity_for_attacker, idempotent
  upsert_attribution_state, get_attribution_state[_for_identity],
  list_multi_actor_identities (the Phase 5 correlator's read).
* attribution.profile.{state_changed,multi_actor_suspected} bus
  topics + builder; wiki Service-Bus.md updated separately.
* attribution_worker.py: subscribes to attacker.observation.>,
  ensures stub identity per event, logs and continues. No merger,
  no state writes, no derived events — Phase 4 wires those.
* attribution/{aggregate.py,_thresholds.py} skeletons: Phase 2
  fills _aggregate_categorical, Phase 3 adds numeric+hash+dispatcher.
2026-05-08 23:16:13 -04:00
e94ab608d9 fix(profiler/behave_shell): tolerate non-UTF-8 bytes in shard reads
Real-world bug surfaced on the first live decky run: sessrec.c's
json_escape (decnet/templates/_shared/sessrec/sessrec.c:111-141)
only escapes bytes < 0x20 + DEL — bytes >= 0x80 pass through raw.
An attacker pasting Latin-1 / GB18030 / any non-UTF-8 8-bit text
yields a shard line that chokes Python's default UTF-8 text-mode
read with 'utf-8 codec can't decode byte 0xac'.

Three changes:

1. _events_for_sid now opens with errors='surrogateescape', preserving
   byte fidelity through the JSON parse. Surrogate-half chars
   correctly fail isascii() / isalpha() so the typed-letter
   histograms filter them out automatically. Tightening sessrec.c to
   escape >= 0x80 is filed for v0.2 — that's the proper forensic-data
   fix; the surrogateescape read makes the engine robust meanwhile.

2. Regression test
   (test_handler_tolerates_non_utf8_bytes_in_shard) builds a shard
   with raw 0xAC bytes inside a JSON 'data' string and asserts the
   handler still persists observations.

3. Collector's _emit_session now logs at WARNING (was DEBUG) when
   find_shard_with_sid returns None, citing the three usual causes
   (ARTIFACTS_ROOT perms, _SERVICE_RE whitelist, sessrec/collector
   race). Surfaces the silent-skip class of bug in seconds instead of
   hours — the first live run hid a perm mismatch
   (User=anti without SupplementaryGroups=decnet) for an entire
   session window before the symptom was traced upstream.
2026-05-08 22:52:46 -04:00
69c8cfd2b9 test(profiler/behave_shell): Phase 6 smoke harness + live-decky runbook
Two-half deliverable per BEHAVE-INTEGRATION.md §587-594:

* scripts/behave_shell/replay_calibration.py — Python helper that
  drives the production handler against one asciinema shard, mints
  a temp SQLite repo + an Attacker per session, captures bus
  emissions in-process. Exits non-zero on zero-observation sessions.

* scripts/behave_shell/smoke.sh — bash entry that replays all five
  2026-05-02 calibration shards (HUMAN / YOU-sim / LW-sim /
  CLAUDE-FF / CLAUDE-CL). Auto-activates .311 venv, forces
  DECNET_DB_TYPE=sqlite, prints per-class summary. Suitable for CI.

* scripts/behave_shell/README.md — runbook covering both halves.
  Pins the manual live-decky procedure (one SSH session per class
  against a deployed smoke-decky, expected dominant primitives table,
  SQL verification query, AttackerDetail panel check, pass criteria).

* BEHAVE-INTEGRATION.md — Phase 6 completion log appended with
  current corpus results table (15 sessions, 424 observations across
  the five classes) and a note that the v0 tag (drop -pre) is gated
  on the manual live-decky round-trip and lands as a separate
  commit.

Live-decky run is intentionally NOT scripted — the integration doc
calls for manual SSH sessions per class so an operator confirms the
bus / collector / disk-reach plumbing under real PTY conditions.
2026-05-08 21:42:11 -04:00
b3ff80d74e test(decnet_web): vitest coverage for Behavioural primitives panel
Four tests pin the panel surface:
* Empty-state placeholder renders when no observations.
* Day-one priority primitives sort to the top of their group:
  motor.input_modality first in motor; the three cognitive priority
  primitives in documented order at the top of cognitive.
* Each row renders primitive leaf, value, and confidence-percent
  badge.
* Groups follow the canonical domain order
  (motor / cognitive / temporal / operational / environmental /
  emotional_valence); unknown domains alphabetise at the end.

Mirrors the Orchestrator.test.tsx harness shape (DEBT-043). Live
update path (useAttackerStream → setObservations) is exercised
indirectly via the static render — the hook is dumb glue and the
state mutation is React-side.
2026-05-08 20:27:40 -04:00
7634e31e5a feat(decnet_web/AttackerDetail): Behavioural primitives panel
Adds the AttackerDetail.tsx panel that surfaces BEHAVE-SHELL
behavioural primitives. Hydrates from the existing
GET /api/v1/attackers/{uuid} response field 'observations',
live-updates via the new useAttackerStream hook (replace-by-primitive
on every 'observation' SSE event).

* New BehaviouralPrimitivesPanel component, exported for vitest.
* Day-one render priority per BEHAVE-INTEGRATION.md §441-454:
  motor.input_modality, cognitive.feedback_loop_engagement,
  cognitive.command_branch_diversity,
  cognitive.inter_command_latency_class — these four sort to the top
  of their respective groups; everything else alphabetises.
* Grouped by top-level domain (motor / cognitive / temporal /
  operational / environmental / emotional_valence) with the canonical
  domain order; unknown domains alphabetise at the end.
* AttackerData interface gains an 'observations' field.
* Empty-state placeholder when the panel has nothing yet.
* Section collapse state extends to 'behavioural', defaults open.

tsc --noEmit clean. Vitest coverage ships in P5.4.
2026-05-08 20:26:55 -04:00
2ff2537f6c feat(decnet_web): useAttackerStream React hook
Per-attacker SSE consumer hook. Mirrors useIdentityStream's shape:
* Connects to /api/v1/attackers/{uuid}/events with ?token= auth.
* Per-event-name dispatch via addEventListener for snapshot,
  observation, fingerprint.rotated, attacker.scored.
* Reconnect-on-error backoff (3s).
* Callback refs so consumer rerenders don't tear down the connection.

The 'observation' event handler receives every primitive's update
through one event name; the primitive rides in payload.primitive
(matches the backend's _sse_name_for collapse decision).

Hook coverage rides on P5.4's panel test.
2026-05-08 20:24:19 -04:00
bb77d13f9a feat(api/attackers): per-attacker SSE events stream
GET /api/v1/attackers/{uuid}/events streams behavioural events for
one attacker. Mirrors decnet/web/router/topology/api_events.py
end-to-end: ?token= auth, require_stream_viewer gate,
sse_connection_slot per-user cap, snapshot-on-connect, three bus
subscriptions (attacker.observation.>, attacker.fingerprint_rotated,
attacker.scored) merged through asyncio.Queue, 15s keepalive,
request.is_disconnected() exit, finally task cancellation.

Per-attacker filter keys on payload['attacker_uuid'] which the
profiler worker stamps onto every published payload (Phase 5 P5.0
amendment) — O(1) drop without a repo round-trip per event.

_sse_name_for derives SSE event names:
  attacker.observation.<primitive> → observation.<primitive>
  attacker.fingerprint_rotated     → fingerprint.rotated
  attacker.scored                  → attacker.scored

10 tests cover snapshot, live forward, per-attacker filter (drops
other attackers' events), fingerprint.rotated forward, 404, 401, and
the sse-name derivation across all four cases. Topology events
regression green.
2026-05-08 20:23:29 -04:00
5116023bf7 feat(profiler/behave_shell): stamp attacker_uuid on bus payload (Phase 5 prep)
The profiler worker's per-observation publish now re-merges
attacker_uuid into the bus payload alongside id/ts/v. Same shape as
the existing DECNET-side deviation from BEHAVE's wire-format
docstring (BEHAVE-INTEGRATION.md §339-366) — widens the deviation
by one DECNET denorm field.

Phase 5's per-attacker SSE route can now filter
attacker.observation.* events to one attacker in O(1) without a repo
round-trip per event. identity_ref stays None today (until the
attribution engine ships); attacker_uuid is independent.

Two test changes:
* test_happy_path_persists_and_publishes asserts attacker_uuid is in
  every published payload.
* New test_attacker_uuid_in_payload_for_filter pins the field
  explicitly and confirms it doesn't conflate with identity_ref.
2026-05-08 20:18:32 -04:00
5ff89eefe7 feat(profiler): wire BEHAVE-SHELL extraction onto attacker.session.ended
The profiler worker now consumes attacker.session.ended on the bus
AND walks unprofiled session_recorded log rows on every tick. Both
paths converge on a single handler that:

1. Validates required payload fields (session_id, decky_id, service,
   attacker_ip, shard_path).
2. Builds evidence_ref shard:{decky}/{service}/{shard_basename}#{sid}
   and skips when has_observations_for_evidence is True (idempotent
   re-runs).
3. Resolves attacker_uuid via get_attacker_uuid_by_ip; defers if the
   profiler tick hasn't materialised the row yet.
4. Reads the asciinema shard, slices events for the sid, calls
   extract_session, persists each Observation via upsert_observation
   (per-row; batch transaction filed as follow-up), then publishes
   each on the bus best-effort (fire-and-forget per DEBT-029 §6).

Architecture:
* Handler lives in decnet/profiler/behave_shell/_handler.py — pure
  function, unit-tested in isolation.
* Worker.py adds _behave_pump (queue feed), _drain_behave_queue
  (per-tick drain), _behave_poll_tick (cursor scan over
  session_recorded logs), and _payload_from_log_row (Log → bus-shape
  payload projection).
* Poll cursor uses a separate state key
  (attacker_worker_session_cursor) so the correlation tick's cursor
  doesn't conflate.
* has_observations_for_evidence promoted to BaseRepository abstract.

22 new tests across handler / drain / poll layers covering happy
path, all skip paths, isolation against handler exceptions,
idempotency on re-run, and cursor key separation. TTP worker bus
tests still green — payload field is purely additive.

Closes BEHAVE-INTEGRATION.md Phase 4.
2026-05-08 18:57:45 -04:00
834aa613b1 feat(pyproject): pin decnet-behave-{core,shell} >=0.1.0,<0.2
Lock the BEHAVE library versions per BEHAVE-INTEGRATION.md
§Versioning. The profiler worker (Phase 4 wiring) imports
`Observation`/`Window` from `decnet_behave_core.spec.envelope` and
`event_topic_for`/`to_event_payload` from
`decnet_behave_shell.spec.event_adapter`; without the pin a broken
wheel or missing install would only show up on first publish.

Four-test smoke pins the public surface: envelope construction,
registry import non-empty, event-adapter topic shape, and the
adapter's id/ts/v exclusion contract.
2026-05-08 18:51:30 -04:00
bf3f9c746a feat(collector): enrich attacker.session.ended payload with shard_path
The collector's _SessionAggregator now resolves the asciinema shard
via find_shard_with_sid and stamps it onto every emitted
attacker.session.ended payload as `shard_path`. None when the shard
isn't on disk yet (collector race with sessrec flush) — consumers
treat that as "skip until next tick".

Additive field; existing TTP worker consumes the same topic and
ignores unknown keys, so no payload-version bump needed. Two new
tests pin the shard-found and shard-missing cases.

Unblocks BEHAVE-INTEGRATION Phase 4: the profiler worker reads
shard_path directly from the payload instead of disk-reaching.
2026-05-08 18:50:45 -04:00
588ea4e411 refactor(artifacts): extract shard-finder out of transcripts router
Move `_find_shard_with_sid`, `_resolve_shard`, `_validate_names`,
`_get_index`, and the index cache from
`decnet/web/router/transcripts/api_get_transcript.py` into
`decnet/artifacts/shards.py`. The shared module speaks
`ValueError`; the router keeps thin wrappers that translate to
`HTTPException(400)` so the route's error UX is unchanged.

This unblocks the BEHAVE-INTEGRATION Phase 4 worker wiring — the
profiler worker (and the collector's session aggregator) need to
disk-reach asciinema shards but must not import from a FastAPI
router.

11 new unit tests for the shared helper. Existing transcript router
tests pass (the shard fixture's monkeypatch points at the shared
module's ARTIFACTS_ROOT now).
2026-05-08 18:49:11 -04:00
aba1e37389 feat(profiler/behave_shell): H.5-pre extractor version marker (0.1.0-pre)
decnet.profiler.behave_shell.__version__ = '0.1.0-pre'.

The -pre suffix is honest: the extractor is feature-complete (37/37
Tier-A primitives emit, calibration grid honest), but the engine
package — worker wiring, observations writes, AttackerDetail panel —
still rides BEHAVE-INTEGRATION.md Phase 4. The actual 0.1.0 tag
lands when Phase 4 lands.

The marker version-tracks the engine, not the spec library
(decnet-behave-shell already at 0.1.0); they version independently.
2026-05-08 18:34:23 -04:00
9ebaca410a test(profiler/behave_shell): H.2 calibration grid full sweep
Run the five-class calibration grid (HUMAN / YOU-sim / LW-sim /
CLAUDE-FF / CLAUDE-CL) against the 2026-05-02 shards.

* Hard gate green for 27 primitives across all 5 shards.
* environmental.keyboard_layout moved from hard gate to
  PHASE_F_CONDITIONAL_PRIMITIVES — short SSH-recon corpus maxes at
  ~90 typed letters per session, well below the LAYOUT_MIN_TYPED_LETTERS
  (200) floor. The 200-floor stays per the per-phase "v0 ships when
  honest" rule; longer-text corpora will surface the layout signal.
* Three primitives never fire on the 2026-05-02 corpus, all already
  conditional and all expected:
  - cognitive.error_resilience.frustration_typing
  - environmental.locale
  - environmental.keyboard_layout

No D / F / G threshold re-tunes needed; only the keyboard_layout
binding-set move. Phase H step log appended to BEHAVE-EXTRACTOR.md
with per-class observation counts.
2026-05-08 18:33:51 -04:00
ac04751c18 test(profiler/behave_shell): H.1 registry-coverage test
Static assertion that every Tier-A primitive in PRIMITIVE_REGISTRY
has a slot in the calibration grid (hard gate or conditional set).
Excludes Tier B (8 cross-session primitives) and Tier C (toolchain.*)
by explicit allow-list and prefix filter.

Three checks:
* every Tier-A primitive is covered (forward direction)
* no extractor set drifts from the registry (reverse, catches typos)
* Tier-A count == 37 (design doc invariant)

CI now fails before a registry addition ships without a feature
function.
2026-05-08 18:30:50 -04:00
f10931f24d test(profiler/behave_shell): Phase G grid lockdown + completion log
Widen calibration binding from PHASE_ABCDEF_PRIMITIVES (25) to
PHASE_ABCDEFG_PRIMITIVES (28 hard). Three Phase G primitives that
emit on any session-with-commands ride the hard gate:

* operational.opsec_discipline
* operational.cleanup_behavior
* emotional_valence.stress_response

The remaining five Phase G primitives ride a new
PHASE_G_CONDITIONAL_PRIMITIVES because their sample-size floors make
them legitimately absent from short shards:

* operational.objective                  (≥ 3 classified commands)
* operational.multi_actor_indicators     (≥ 8 commands)
* emotional_valence.arousal              (typing bursts)
* emotional_valence.valence              (≥ 80 typed letters)
* emotional_valence.frustration_venting  (≥ 30 typed letters)

Backwards-compat alias PHASE_ABCDEF_PRIMITIVES kept. Phase G
completion log + checkbox flips in BEHAVE-EXTRACTOR.md.

Tier-A corpus delta: all 37 Tier-A primitives now emit. Phase H
(full-corpus lockdown + v0 release) is next.
2026-05-08 16:40:13 -04:00
79f253c969 feat(profiler/behave_shell): G.8 emotional_valence.frustration_venting
Binary read of ctx.obscenity_hits (G.0 lexical counter):
* detected — obscenity_hits ≥ 1
* none     — zero hits

Skip below FRUST_VENT_MIN_TYPED_CHARS (30). Confidence hard-capped at
0.5: 0.40 when detected, 0.50 only when cleanly absent over ≥ 200
typed letters, 0.30 otherwise.
2026-05-08 16:37:29 -04:00
40a283a7ec feat(profiler/behave_shell): G.7 emotional_valence.stress_response
Compare median post-error intra-command IATs against baseline
(commands not immediately following an errored command):

* ratio ≥ STRESS_EUSTRESS_RATIO_MIN (1.20) → eustress_positive
* ratio ≤ 1/STRESS_DISTRESS_RATIO_MIN     → distress_negative
* otherwise                                → none

Confidence hard-capped at 0.5; 0.30 below
STRESS_MIN_ERRORED_WITH_IATS (2).
2026-05-08 16:36:34 -04:00
d4dc7dff81 feat(profiler/behave_shell): G.6 emotional_valence.arousal
high_agitated when any of:
  * caps_run_max ≥ 5
  * bang_run_max ≥ 3
  * fastest typing burst median IAT < 0.06s with ≥ 30 IATs total

low_calm when slowest qualifying burst median IAT > 0.30s with ≥ 30
IATs. Else medium_engaged. Confidence hard-capped at 0.5; 0.30 below
AROUSAL_MIN_IATS.
2026-05-08 16:35:29 -04:00
3ba7e22b71 feat(profiler/behave_shell): G.5 emotional_valence.valence
Soft primitive — pure ratio over G.0 lexical counters:

* positive — positive_lex_hits > negative + obscenity, ≥ VALENCE_MIN_HITS
* negative — (negative + obscenity) > positive, sum ≥ VALENCE_MIN_HITS
* neutral  — fall-through

Skip below VALENCE_MIN_TYPED_CHARS (80). Confidence hard-capped at
EMOTIONAL_VALENCE_CONFIDENCE_CAP (0.5) inside the feature function;
0.30 below VALENCE_FULL_CONFIDENCE_MIN (200). Cap is registry
convention.
2026-05-08 16:34:27 -04:00
acf8382bcf feat(profiler/behave_shell): G.4 operational.multi_actor_indicators
Compare median intra-command IATs of the two temporal halves of the
session. ≥ MULTI_ACTOR_HALF_MIN_COMMANDS (4) per half required;
relative delta > MULTI_ACTOR_HANDOFF_DELTA (0.5) → handoff_detected.

team_coordinated is Tier B (cross-session); never emitted from a
single session. Confidence 0.55 with both halves ≥ 8 commands; 0.40
otherwise.
2026-05-08 16:33:15 -04:00
17b53dad4d feat(profiler/behave_shell): G.3 operational.cleanup_behavior
* thorough — ≥ CLEANUP_THOROUGH_MIN_DISTINCT (3) distinct
  cleanup-family hashes in tail-CLEANUP_TAIL_K (5).
* partial  — 1-2 distinct.
* none     — zero hits.

Adjacent to E.4's binary exit_behavior=cleanup; G.3 graduates the
intensity. Confidence 0.55 above 8 commands; 0.35 below.
2026-05-08 16:32:08 -04:00
337c7392b9 chore: untrack accidentally-committed threatfox-api.json
Slipped in via `git add -A` in the G.2 commit. Local artifact, never
intended for tracking.
2026-05-08 16:30:18 -04:00
09f598ce47 feat(profiler/behave_shell): G.2 operational.opsec_discipline
* careful — operator hits OPSEC_HISTORY_TOKENS AND tail-K commands
  include _CLEANUP_TOKEN_HASHES (re-imported from temporal.py).
* learning — history hit without cleanup-tail follow-through.
* careless — no history-clearing vocabulary at all.

Confidence 0.45 (small lexicon, soft); 0.30 below
MIN_COMMANDS_FOR_FULL_CONFIDENCE.
2026-05-08 16:29:48 -04:00
c11f3605be feat(profiler/behave_shell): G.1 operational.objective
Per-command intent classification via the G.0 lexicon
(`destructive > persistence > exfil > lateral > recon` precedence);
majority vote across classified commands. Skip emission below
INTENT_MIN_COMMANDS=3 classified hits. Confidence 0.40 below
INTENT_FULL_CONFIDENCE_MIN=6, 0.60 above.
2026-05-08 16:28:45 -04:00
289a64014c feat(profiler/behave_shell): G.0 intent lexicon + lexical counter pass
Phase G shared infrastructure (no primitive yet emitted):

* New `_intent.py` — five precomputed first-token-hash sets (recon /
  exfil / persistence / lateral / destructive) with documented
  precedence, plus opsec-history and three lexeme sets (positive /
  negative / obscenity) for the typed-text counter pass. Stop words
  that collide with registry value vocabulary (`no`, `hell`, `ok`)
  are deliberately excluded — the PII regression test catches such
  collisions.

* `_typed_char_histograms()` extended with five integer counters
  populated in the same single-pass walk: `obscenity_hits`,
  `positive_lex_hits`, `negative_lex_hits`, `caps_run_max`,
  `bang_run_max`. Longest-suffix match against bounded lexicon
  (`LEXEME_MAX_LEN`); paste-class events excluded.

* `SessionContext` widened by the same five fields. Drives G.5
  (valence), G.6 (arousal), G.8 (frustration_venting) without retaining
  raw operator text.

* Bump twisted >= 26.4.0rc2 to clear CVE-2026-42304 (pre-existing,
  caught by pre-commit pip-audit). Adjust ftp template type-ignore
  code from attr-defined to misc to match the new Twisted typing.

PII discipline: same shape as F.4 — fixed-vocabulary integer counters
on ctx, never on observations.
2026-05-08 16:27:25 -04:00
a25f4a890d test(profiler/behave_shell): Phase F + E.4 grid lockdown + completion log
Widens the binding calibration set from PHASE_ABCDE_PRIMITIVES (20)
to PHASE_ABCDEF_PRIMITIVES (25). The five new entries:

* environmental.shell_type (per-shard hard gate)
* environmental.terminal_multiplexer (per-shard hard gate)
* environmental.keyboard_layout (per-shard hard gate; PII boundary
  lifted by ANTI; emits all 4 registry values)
* environmental.numpad_usage (per-shard hard gate)
* temporal.lifecycle_markers.exit_behavior (resolution of the E.4
  hold; uses Command.followed_by_prompt from F.0)

environmental.locale joins a new PHASE_F_CONDITIONAL_PRIMITIVES set
(only fires on shards with an env / locale dump in the output).

Phase F completion log appended to BEHAVE-EXTRACTOR.md. The original
F.0 row hinted at D.0 subsumption; reversed in the log — D.0 is
enriched, not subsumed (regex catches errors when PS1 is suppressed).

Tier-A corpus delta: 25 of 37 primitives now emit. Phase G is next.
2026-05-04 00:44:22 -04:00
51ecd0924e feat(profiler/behave_shell): emit temporal.lifecycle_markers.exit_behavior
Resolves the E.4 hold from Phase E. F.0's Command.followed_by_prompt
gives us the exit-code proxy (prompt-after-last-command) we couldn't
get in Phase E.

Logic: last command without trailing prompt → abrupt; first_token_hash
in {exit, logout, quit, logoff} → graceful; any of the last K=3
commands' first_token_hash in {history, unset, rm, shred, clear, kill}
→ cleanup; else → graceful (clean Ctrl-D / window close).
2026-05-04 00:42:25 -04:00
c8166a6071 feat(profiler/behave_shell): emit environmental.numpad_usage
Sliding-window scan over single-char digit input events. A run of
NUMPAD_RUN_MIN (4) consecutive digit events whose pairwise IATs are
all ≤ NUMPAD_FAST_IAT_S (50ms) → detected. Otherwise → not_detected.
Skips below NUMPAD_MIN_TYPED_CHARS (50) typed chars. Confidence cap
0.50 per the registry's weak-signal flag.
2026-05-04 00:40:42 -04:00
cd7c7ea5a2 feat(profiler/behave_shell): emit environmental.keyboard_layout
ANTI authorised dropping the PII boundary for this primitive. ctx
gains typed_unigram_counts / typed_bigram_counts / typed_letter_count
populated during the existing single-pass input walk (paste-class
events excluded).

Two-axis classifier:
* layout-artefact unigrams take priority — q rate above floor with
  low English saturation → azerty; z above floor with y below → qwertz
* fallback to English-bigram saturation: ≥ floor → qwerty, else other

Sample-size floor 200 typed letters; bigram histogram capped at
top-64 to bound memory. Confidence cap stays moderate (0.40-0.55) —
heuristic discriminator.
2026-05-04 00:38:24 -04:00
b7ff5d2cc1 feat(profiler/behave_shell): emit environmental.locale
Searches ANSI-stripped output for LANG / LC_ALL / LC_CTYPE envvar
substrings emitted by env / locale / printenv. Highest-priority key
wins (LC_ALL > LANG > LC_CTYPE); POSIX value normalised to BCP-47:
en_US.UTF-8 → en-US, pt_BR.UTF-8 → pt-BR, C/POSIX → und. Free-string
registry value emitted directly.

PII discipline: only the parsed locale value enters observations;
surrounding output is read once for matching and dropped.
2026-05-04 00:35:31 -04:00
4257f7b6e2 feat(profiler/behave_shell): emit environmental.terminal_multiplexer
Scans RAW output (multiplexer escapes are themselves ANSI; never
strip first) for tmux markers (DCS passthrough, focus-reporting,
window-title with tmux marker) and screen markers (DCS, screen-OSC).
Detected → tmux/screen at 0.85; otherwise → none at 0.55. Skips
emission entirely when no commands — silence on a pure-echo or
empty session, per the smoke gates.

When both detected (nested mux), prefer tmux.
2026-05-04 00:33:44 -04:00
07ff5ff0c9 feat(profiler/behave_shell): emit environmental.shell_type
Per-prompt classification mode over ctx.prompt_lines. $/# → bash;
% → zsh; > with 'PS ' prefix → powershell; > with 'C:\' substring →
cmd.exe; > otherwise → fish. New _features/environmental.py module
opens Phase F.
2026-05-04 00:30:24 -04:00
1ff02f0c77 feat(profiler/behave_shell): F.0 prompt-line detector
Adds PromptLine dataclass + extract_prompt_lines() helper. PromptLine
carries ts, suffix_char ($/#/%/>), raw_line (ANSI-stripped, capped),
is_root flag. Populated during the existing single-pass output-window
walk; SessionContext gains prompt_lines, Command gains
followed_by_prompt.

PII trade-off (ANTI-authorised at Phase F): PS1 text retained on ctx
so F.1 / F.3 / E.4 can read it. Capped at PROMPT_LINE_MAX_CHARS=256.
Observations still only carry derived primitive values.

D.0's regex error helpers stay alongside (NOT subsumed) — they fire
even when PS1 echo is suppressed. F.0 enriches D.0 rather than
replacing it.
2026-05-04 00:29:08 -04:00
b7534c311a docs(behave): cross-reference Phase F.0 with held E.4 and landed D.0
F.0's row in BEHAVE-EXTRACTOR.md was forward-only — readers landing
on Phase F couldn't tell that F.0 also has a backlog (E.4 held, D.0
subsumption). Add a 'Carry-overs F.0 must unblock' section to the
Phase F prelude and a back-reference on the F.0 checkbox in the
implementation order checklist.
2026-05-04 00:17:37 -04:00
96a4039366 test(profiler/behave_shell): Phase E grid lockdown + completion log (E.4 held)
Widens the binding calibration set from PHASE_ABCD_PRIMITIVES (17) to
PHASE_ABCDE_PRIMITIVES (20). The three shipped Phase E primitives
(session_duration, escalation_pattern, landing_ritual) join the
per-shard hard gate.

E.4 (temporal.lifecycle_markers.exit_behavior) is held at ANTI's
direction pending Phase F.0's prompt parser — abrupt-vs-cleanup
needs exit-code visibility to be honest, and first-token membership
alone over-fires on benign rm / clear mid-session. E.4 picks up at
the tail of Phase F.

Phase E completion log appended to BEHAVE-EXTRACTOR.md; E.1-E.3
checkboxes flipped, E.4 left unchecked with a held note.
2026-05-04 00:16:33 -04:00
1341df2705 feat(profiler/behave_shell): emit temporal.lifecycle_markers.landing_ritual
Inspect the first N commands; if at least K of their first_token_hashes
match the recon-survey vocabulary (uname/id/whoami/pwd/hostname/w/who),
emit present, else absent. Hashes precomputed at module load; PII-safe.
v0.1 N=5, K=2.
2026-05-04 00:15:05 -04:00
d40495d71b feat(profiler/behave_shell): emit temporal.escalation_pattern
Bin commands into non-overlapping windows of width
max(ESCALATION_WINDOW_MIN_S, duration_s / ESCALATION_WINDOW_TARGET).
CV of per-window counts + zero-window fraction classify bursty /
sustained / erratic. v0.1; corpus re-tune deferred.
2026-05-04 00:13:45 -04:00
627fa59c15 feat(profiler/behave_shell): emit temporal.session_duration
Bucket ctx.duration_s against SESSION_DURATION_SHORT_MAX (60s) /
MEDIUM_MAX (600s) / LONG_MAX (3600s); else marathon. Direct
measurement, confidence 0.85. Skip emission only when no commands
and zero duration. New _features/temporal.py module opens Phase E.
2026-05-04 00:10:57 -04:00
46775fc0e5 test(profiler/behave_shell): Phase D calibration-grid lockdown + completion log
Widens the binding calibration set from PHASE_ABC_PRIMITIVES (13) to
PHASE_ABCD_PRIMITIVES (17). The four unconditional Phase D primitives
(cognitive_load, exploration_style, planning_depth, tool_vocabulary)
join the per-shard hard gate. The three error_resilience.* primitives
are conditional on at least one errored command in the shard and
tracked in PHASE_D_CONDITIONAL_PRIMITIVES — excluded from the
per-shard required-emission set, included in the cross-class
discrimination check.

cognitive_load empirical re-tune deferred to the next
BEHAVE_CALIBRATION_DIR run; v0.1 thresholds ship.

Phase D completion log appended to BEHAVE-EXTRACTOR.md; Phase D
checkboxes flipped to [x].
2026-05-04 00:03:46 -04:00
0fba6b6113 feat(profiler/behave_shell): emit cognitive.error_resilience.fallback_to_man
For each errored command, check whether the next command's
first_token_hash is in {man, help, info} (precomputed at module
load). At least one match → present, else absent. The --help / -h
flag forms aren't first tokens; v0.2 will reconsider once arg-token
hashing is justified by corpus.
2026-05-04 00:01:45 -04:00
8183218d29 feat(profiler/behave_shell): emit cognitive.error_resilience.frustration_typing
Compares median within-command IAT for commands following an errored
command vs commands following a successful one. Relative absolute delta
buckets to low / moderate / high. Skips when either group is empty
(no errors, or no clean baseline). v0.1; D.8 re-tunes.
2026-05-04 00:00:36 -04:00
b704352783 feat(profiler/behave_shell): emit cognitive.error_resilience.retry_tactic
Modal response across Command.errored=True commands:
* same first_token_hash on next command → rerun
* different first_token_hash         → switch
* no next command                    → abort
Tiebreak in registry order. The fourth registry value 'modify'
requires within-command arg diffing (PII boundary); deferred to v0.2.
2026-05-03 23:58:58 -04:00
f286c84d95 feat(profiler/behave_shell): emit cognitive.tool_vocabulary
Absolute distinct first_token_hash count, bucketed against
TOOL_VOCAB_NARROW_MAX / TOOL_VOCAB_BROAD_MIN. v0.1; D.8 re-tunes.
2026-05-03 23:56:22 -04:00
6c2e4ada83 feat(profiler/behave_shell): emit cognitive.planning_depth
Distribution of inter-command IATs bucketed against IKI_THINK_MAX_S
(deep) and INTER_CMD_INSTANT_MAX (reactive); fall-through is shallow.
v0.1 thresholds; D.8 re-tunes.
2026-05-03 23:55:16 -04:00
2254651270 feat(profiler/behave_shell): emit cognitive.exploration_style
Two-axis classification over the first_token_hash sequence:
repetition_rate (drilling) vs backtrack_rate (jumping among prior
tools). chaotic/targeted/methodical buckets. v0.1 thresholds; D.8
re-tunes.
2026-05-03 23:54:03 -04:00
f948e10830 feat(profiler/behave_shell): emit cognitive.cognitive_load
Composite over three [0, 1]-clipped sub-signals (chunking variance,
error rate from D.0's Command.errored, pace variability), mean-aggregated
and bucketed against COGNITIVE_LOAD_LOW_MAX / COGNITIVE_LOAD_MEDIUM_MAX.
Components missing data drop out of the mean rather than zeroing it.

v0.1 thresholds; D.8 re-tunes once D.2-D.7 are stable. Confidence
held at 0.60 (composite over soft sub-signals) and halved below the
5-command sample-size floor.
2026-05-03 23:52:29 -04:00
601986bd6d feat(profiler/behave_shell): output error-signal helper for Phase D
Lifts the error-signal slice of F.0 forward as a D.0 prelude. ANSI
strip + canonical bash/sh error fingerprints classify each command's
post-execution output window; Command gains errored / output_bytes
fields. PII discipline preserved — only a bool and an int leave the
helper, the stripped output text is dropped on return.

Drives D.1 (cognitive_load error_rate term) and D.5–D.7 (error_resilience
family). Phase F.0 will subsume this with PS1 + exit-code parsing.
2026-05-03 23:46:31 -04:00
bc62e42ce1 feat(profiler/behave_shell): emit motor.shell_mastery.pipe_chaining_depth 2026-05-03 23:34:54 -04:00
4fc980e968 feat(profiler/behave_shell): emit motor.shell_mastery.shortcut_usage 2026-05-03 23:33:07 -04:00
a077cf67c8 feat(profiler/behave_shell): emit motor.shell_mastery.tab_completion 2026-05-03 23:31:20 -04:00
771944830a docs(behave): close Phase B in BEHAVE-EXTRACTOR.md
Tick the four Phase B checkboxes (B.1-B.4) and append a Phase B
completion log inline (per the "append phase logs to design docs"
memory rule). Captures per-primitive confidence ranges, source
signals, and the PII-discipline regression that all four
primitives uphold.

Phase A + Phase B = 10 primitives emitting on every shard;
PHASE_AB_PRIMITIVES is binding for every subsequent phase.
Phase C (motor.shell_mastery.*) lands next.
2026-05-03 21:30:13 -04:00
8161c67ec5 feat(profiler/behave_shell): emit motor.command_chunking
BEHAVE-EXTRACTOR.md Phase B Step B.4. First implementation —
prototype doesn't ship this primitive.

* SessionContext gains intra_command_iats: per-command tuple of
  IATs between consecutive input events whose timestamps fall
  inside [cmd.start_ts, cmd.end_ts). Excludes the terminator IAT.
  Built by _per_command_iats.
* _features/motor.py:command_chunking(ctx) emits one Observation
  in {fluent, fragmented, single_command}.
  - 0 commands → skip emit
  - 1 command → single_command (registry-allowed point)
  - ≥2 commands → median CV across per-command typed-IATs;
    < CMD_CHUNKING_FLUENT_CV_MAX (0.50) → fluent, else fragmented
  - paste-only sessions (no command has ≥3 typed IATs) → skip emit
    (no honest within-command rhythm to measure)
  Confidence 0.80 / 0.65 / 0.60.
* Calibration grid widened to include motor.command_chunking;
  green across all five shards. Phase B primitive set complete.

Tests: no commands → skip, 1 command → single_command, uniform
typing → fluent, alternating fast/slow → fragmented, paste-only
multi-command → skip emit.
2026-05-03 21:29:31 -04:00
d04f91cd8c feat(profiler/behave_shell): emit motor.error_correction
BEHAVE-EXTRACTOR.md Phase B Step B.3. Replaces the prototype's
two-line "0 vs >0 backspaces" placeholder with a backspace-timing
classifier that honours the registry's full vocabulary.

* SessionContext gains backspace_count, backspace_iats (IAT from
  each backspace back to the preceding non-backspace input event),
  and kill_line_count (^U / ^W). Built by _scan_correction_signals,
  which retains only counts and timing aggregates — no character
  data leaves the helper, in line with the BEHAVE PII discipline.
* _features/motor.py:error_correction(ctx) emits one Observation
  in {immediate, deferred, absent, route_around}.
  - 0 backspaces + ≥1 ^U/^W → route_around (rewrite, not correct)
  - 0 backspaces + 0 kill-lines → absent
  - backspaces with median IAT ≤ 500 ms → immediate
  - slower → deferred
  Confidence 0.65 / 0.65 / 0.55 / 0.55.
* < 3 inputs → skip emit.
* Calibration grid widened to include motor.error_correction;
  green across all five shards.

Tests cover all four buckets, the < 3 inputs skip, and the PII
regression (raw command body never appears in the serialised
observation).
2026-05-03 21:27:46 -04:00
0737fcfe93 feat(profiler/behave_shell): emit motor.motor_stability
BEHAVE-EXTRACTOR.md Phase B Step B.2. First principled
implementation — the prototype doesn't ship this primitive at all.

* _features/motor.py:motor_stability(ctx) emits one Observation
  in {steady, variable, tremor}. Reuses ctx.typing_bursts from B.1.
* Tremor proxy: fraction of within-burst IATs below
  TREMOR_FAST_FLOOR_S (30 ms — humans can't sustain sub-50 ms IATs).
  ≥ TREMOR_RATE_MIN (10%) sub-floor → tremor (double-press / motor
  twitch / stuck-key).
* Otherwise median burst CV decides: < CV_STEADY_MAX → steady,
  else → variable. Confidence 0.70 / 0.60 / 0.65.
* No typing bursts or fewer than 5 within-burst IATs → skip emit.
* Calibration grid widened to include motor.motor_stability; green
  across all five shards.

Tests cover all three buckets + skip paths.
2026-05-03 21:25:54 -04:00
d90c8b70ce feat(profiler/behave_shell): emit motor.keystroke_cadence
BEHAVE-EXTRACTOR.md Phase B Step B.1.

* SessionContext gains typing_bursts: tuple[tuple[float, ...], ...]
  built by _split_typing_bursts(iats) — splits at gaps > IKI_THINK_MAX_S
  (1.5s) and drops bursts of fewer than 3 IATs. Mirrors prototype's
  _split_into_bursts at BEHAVE/prototype_extractors/shell/extract.py:275.
* _features/motor.py:keystroke_cadence(ctx) emits one Observation
  in {steady, bursty, hunt_and_peck, machine}. Median CV across
  typing bursts; mean IKI < IKI_MACHINE_MAX_S paired with CV <
  CV_MACHINE_MAX → machine. Confidence 0.85/0.70/0.65/0.60 per the
  prototype's calibration history.
* < MIN_INPUTS_FOR_CADENCE inputs or zero typing bursts → skip
  emission. v0.1 emits only the burst-CV variant; the prototype's
  NAIVE session-CV variant is parked for v0.2.
* Calibration grid widened (PHASE_A_PRIMITIVES → PHASE_AB_PRIMITIVES)
  to include motor.keystroke_cadence. Grid green across all five
  shards.

Tests: too-few-inputs → no emit, all-think-pauses → no burst → no
emit, uniform IATs → steady, sub-5ms → machine, mixed-pace → bursty,
extreme bimodal → hunt_and_peck.
2026-05-03 21:24:13 -04:00
0510cde073 feat(profiler/behave_shell): Phase A — calibration floor green
BEHAVE-EXTRACTOR.md Phase A Step 10. Closes the discriminative
floor: six primitives emit, the five-class calibration grid is the
binding regression test for every subsequent phase.

* Phase A checklist boxes (Steps 0-10) ticked in
  development/BEHAVE-EXTRACTOR.md.
* Phase A completion log appended inline to the design doc per
  the "append phase logs to design docs" memory rule — captures
  per-primitive confidence ranges and the 2026-05-02 empirical
  anchors that drove threshold calibration.
* Hard gate: tests/profiler/behave_shell/test_calibration_grid.py
  parametrised over five class shards, all green; skips cleanly
  on BEHAVE_CALIBRATION_DIR unset.

Phases B-G expand horizontally across the registry. Phase H is
the full-corpus lockdown + v0 release. Worker
(BEHAVE-INTEGRATION.md Phase 4) is unblocked at this milestone —
it can wire per-session production against the Phase A engine
without waiting for the rest of the Tier-A corpus.
2026-05-03 08:02:02 -04:00
640294f3dc test(profiler/behave_shell): five-class calibration grid lockdown
BEHAVE-EXTRACTOR.md Phase A Step 9 — the gate. Runs the pure
engine against each of the five 2026-05-02 calibration shards and
pins the contract that all subsequent Phase B-G PRs must keep
green: every Phase A primitive (motor.input_modality,
motor.paste_burst_rate, cognitive.inter_command_latency_class,
cognitive.command_branch_diversity, cognitive.feedback_loop_engagement,
cognitive.inter_command_consistency) fires at least once per shard.

* tests/profiler/behave_shell/test_calibration_grid.py
  parametrized over (shard_file, class_label) for HUMAN / YOU-sim /
  LW-sim / CLAUDE-FF / CLAUDE-CL. Skips entirely when
  BEHAVE_CALIBRATION_DIR is unset (CI provides the path; local dev
  doesn't have to).
* Plus a discrimination-smoke check: at least one primitive
  produces different majority values across present classes —
  catches the "constant-output regression" failure mode where the
  engine quietly degenerates to a stub.

Calibration tweak: BRANCH_DIVERSITY_LINEAR_MIN dropped from 0.80 to
0.70 to align with the prototype's empirical anchors (CLAUDE-CL ≈
0.55-0.60 adaptive; YOU-sim / CLAUDE-FF scripted recon ≈ 0.75+
linear). Test for the middle band re-pinned at the new boundary.

Per-class value pinning (e.g. HUMAN must emit
inter_command_consistency=bimodal) is intentionally NOT a hard gate
yet — v0.1 thresholds put real human sessions in "variable", and
true bimodal detection (Hartigan dip / two-peak) is registry-flagged
for v0.2. Tighter pinning lands as the corpus grows.
2026-05-03 08:00:50 -04:00
842b7de950 feat(profiler/behave_shell): emit cognitive.inter_command_consistency
BEHAVE-EXTRACTOR.md Phase A Step 8. Dispersion / bimodality of
inter-command pauses. HUMAN-bimodal vs LLM-metronomic.

* _features/cognitive.py:inter_command_consistency(ctx) emits one
  Observation in {metronomic, variable, bimodal}.
* CV = stdev / mean of ctx.inter_cmd_iats. CV < 0.40 → metronomic
  (LLM-pure; corpus anchor 0.24); CV ≥ 1.50 → bimodal heuristic
  (LLM-assisted human; v0.1 placeholder, true bimodal via Hartigan
  dip is registry-flagged for v0.2); else → variable (human;
  corpus anchor 0.94).
* < 2 IATs or zero mean → skip emission. < 5 commands halves
  confidence (0.40 vs 0.75) per sample-size honesty.

Tests: too-few IATs → no emission, uniform → metronomic,
human-like dispersion → variable, extreme bursts+gaps → bimodal,
low-sample-count → reduced confidence.

Step 8 closes the six-primitive calibration floor for Phase A.
Step 9 (calibration grid lockdown) is the gate that pins it.
2026-05-03 07:56:49 -04:00
2f8c107e70 feat(profiler/behave_shell): emit cognitive.feedback_loop_engagement
BEHAVE-EXTRACTOR.md Phase A Step 7. The orthogonal axis — does the
operator's pause-after-command correlate with bytes of output they
just saw? Splits HUMAN/CLAUDE-CL (closed_loop) from LW-sim/CLAUDE-FF
(fire_and_forget); cuts ACROSS the LLM/human axis.

* _features/cognitive.py:feedback_loop_engagement(ctx) emits one
  Observation in {closed_loop, fire_and_forget, unknown}.
* Pearson correlation between ctx.output_per_cmd[i] and
  ctx.inter_cmd_iats[i] (paired by construction in Step 4); via
  statistics.correlation with constant-series fallback to "unknown".
* r > FEEDBACK_CORRELATION_MIN (0.30) → closed_loop; otherwise
  (zero, negative, or undefined) → fire_and_forget.
* First primitive that depends on output events: zero output events
  in the shard or fewer than FEEDBACK_MIN_PAIRS (5) pairs → emit
  "unknown" at confidence 1.0 (the absence-of-data is itself a
  high-confidence answer). Zero-command session skips entirely.

Tests: no-output → unknown, few-pairs → unknown, strong positive r
→ closed_loop, constant pace → fire_and_forget/unknown,
negative r → fire_and_forget.
2026-05-03 07:55:38 -04:00
3fc6ea5f75 feat(profiler/behave_shell): emit cognitive.command_branch_diversity
BEHAVE-EXTRACTOR.md Phase A Step 6. Content-based playbook-vs-
adaptive split. Splits CLAUDE-FF (linear_playbook, ~10 distinct
tools) from CLAUDE-CL (adaptive_branching, 5-6 tools with curl
re-invoked) per the 2026-05-02 empirical anchor.

* _features/cognitive.py:command_branch_diversity(ctx) emits one
  Observation in {linear_playbook, adaptive_branching, unknown}.
* unique_first_token_hashes / total_commands ratio. ≥ 0.80 →
  linear_playbook, otherwise adaptive_branching (the doc instructs
  bias-to-adaptive in the middle band — that's the discriminative
  signal we actually want).
* < 5 commands → "unknown" at confidence 1.0 (the absence of data
  is itself a high-confidence answer per the registry's allowed
  vocabulary). Zero-command session skips emission entirely.

Tests cover unique-tokens → linear, repeated-tokens → adaptive,
middle band → adaptive (bias), under-floor → unknown @ 1.0, plus
PII regression: raw tokens never appear in the serialised
observation.
2026-05-03 07:54:13 -04:00
e52a0e0381 feat(profiler/behave_shell): emit cognitive.inter_command_latency_class
BEHAVE-EXTRACTOR.md Phase A Step 5. Classifies the operator's
thinking pace between commands. Splits LW-sim / CLAUDE-FF /
CLAUDE-CL.

* _features/cognitive.py:inter_command_latency_class(ctx) emits one
  Observation in {instant, typing_speed, deliberate,
  llm_lightweight, llm_heavyweight, long}, computed as the median
  of ctx.inter_cmd_iats bucketed against the prototype thresholds
  (v0.2 split: lightweight 2-8s, heavyweight 8-30s).
* Sample-size honesty: < 5 commands halves confidence (0.40 vs
  0.80) per BEHAVE-EXTRACTOR.md.
* Threshold consts (INTER_CMD_*_MAX, MIN_COMMANDS_FOR_FULL_CONFIDENCE,
  plus parked Step 6/7/8 thresholds for the next three commits)
  added to _thresholds.py.

Tests cover all six buckets at empirically-anchored IATs (15s ≈
Claude Opus driving recon via tmux send-keys), plus the
single-command no-IAT and low-sample-count paths.
2026-05-03 07:52:39 -04:00
f3880b24d1 feat(profiler/behave_shell): command segmentation in SessionContext
BEHAVE-EXTRACTOR.md Phase A Step 4. Pure refactor inside _ctx.py —
no new feature emits. Lays the shared utility for the three
cognitive primitives next in line (Steps 5-7).

* Command dataclass (frozen): start_ts, end_ts, first_token_hash.
  PII-safe by construction — only the first whitespace-delimited
  token of the command is retained, and only as a sha256 hash
  (decnet/profiler/behave_shell/_parse.py:hash_token).
* _segment_commands walks input events char-by-char, splits on
  \r / \n, hashes the first token, drops the rest.
* SessionContext gains commands, inter_cmd_iats, output_per_cmd.
  output_per_cmd[i] counts bytes between commands[i].end_ts and
  commands[i+1].start_ts — the natural pairing for Step 7
  (feedback_loop_engagement).

Tests: empty / unterminated streams, single command (CR + LF
terminators), paste-with-newline, multi-command IAT pairing,
output-byte counting between boundaries, blank-line skip,
first-token-only PII discipline.
2026-05-03 07:50:55 -04:00
6763fceb0b feat(profiler/behave_shell): emit motor.paste_burst_rate
BEHAVE-EXTRACTOR.md Phase A Step 3. Same paste-event ratio as
motor.input_modality but coarser-bucketed: this is the *habit*
signal (does the operator reach for paste at all?), where
input_modality is the dominant-channel signal.

* _features/motor.py:paste_burst_rate(ctx) emits one Observation
  per session in {none, occasional, habitual} with confidence
  0.70 / 0.70 / 0.80.
* Thresholds: PASTE_RATE_OCCASIONAL_MIN=0.10,
  PASTE_RATE_HABITUAL_MIN=0.50.

Splits YOU-sim from LW/CLAUDE-FF/CLAUDE-CL — LLM-driven sessions
paste habitually, real humans rarely paste.

Tests: pure-typed → none; 1-paste-in-10 → occasional;
paste-majority → habitual; output-only → no observation; habitual
confidence > occasional confidence.
2026-05-03 07:49:03 -04:00
879f5e731b feat(profiler/behave_shell): emit motor.input_modality
BEHAVE-EXTRACTOR.md Phase A Step 2. The first primitive — picked
first because it has the highest discriminative value (HUMAN vs
everyone) and the simplest implementation (paste-event ratio over
total inputs).

* _features/motor.py:input_modality(ctx) emits one Observation
  per session in {typed, pasted, mixed} with confidence 0.75 / 0.70.
* _features/_emit.py centralises the make_observation helper so
  every feature module gets the same Window/source/evidence_ref
  boilerplate without copy-paste.
* Thresholds inherited from the prototype's calibration history
  (MODALITY_PASTED_MIN=0.40, MODALITY_TYPED_MAX=0.05).
* Zero-input session skips emission — registry doesn't admit
  "unknown" here.

Tests: pure-typed → typed, pure-pasted → pasted, mixed → mixed,
output-only session → no observation, full envelope round-trip.
2026-05-03 07:47:38 -04:00
c9a81a23c2 feat(profiler/behave_shell): asciinema parser + paste-burst detection
BEHAVE-EXTRACTOR.md Phase A Step 1. Lays the shared primitives that
Steps 2-3 (motor.input_modality, motor.paste_burst_rate) will
consume:

* parse_shard_line / parse_shard turn a shard JSONL line/file into
  AsciinemaEvents, skipping headers and malformed records.
* PasteBurst dataclass + _detect_paste_bursts group consecutive
  paste-class input events (len(d) >= 4 chars per the prototype's
  empirical floor) into contiguous bursts, splitting on IAT gaps
  larger than PASTE_BURST_MAX_IAT_S (200ms).
* SessionContext now carries iats and paste_bursts derivations.
* Threshold constants harvested from
  BEHAVE/prototype_extractors/shell/extract.py — calibrated against
  the five 2026-05-02 shards.

Tests cover pure-typed, pure-pasted, mixed streams; close vs far
paste events; typed events breaking a burst; PasteBurst immutability;
and the JSON parser's junk handling.
2026-05-03 07:46:01 -04:00
f8eae04e5d feat(profiler/behave_shell): scaffold extract_session entry point
BEHAVE-EXTRACTOR.md Phase A Step 0. Lays the package skeleton
(__init__/extract/_parse/_ctx/_thresholds/_features) with empty
FEATURES = (), so the worker plumbing in BEHAVE-INTEGRATION Phase 4
has a stable import path before any primitive lands.

extract_session() builds a SessionContext once and fans the
registered feature functions across it; at Step 0 that fan-out is
empty and the function yields nothing. Step 1 (asciinema parser +
paste-burst detector) and Step 2 (motor.input_modality) land next.

Smoke suite asserts the empty contract: empty stream → no
observations, single event → t_start == t_end, multi-event → events
routed into input_events / output_events by kind, evidence_ref
defaults to "session:<sid>" or honours an explicit override.
2026-05-03 07:42:09 -04:00
a2a61b636e feat(web): drop SessionProfile, wire observations into AttackerDetail (DEBT-050 / DEBT-036 closure)
Destructive half of BEHAVE-INTEGRATION.md Phase 1. SessionProfile +
its kd_* columns + the dialect ALTER TABLE migration helpers are
deleted outright; pre-v1, the table shipped empty, no migration
ceremony required (per the no-new-_migrate_-pre-v1 memory rule).
DEBT-036 closes via DEBT-050 supersedure. AttackerDetail's
``observations`` field is wired to the new ``observations`` table
and returns an empty list until the BEHAVE-SHELL extractor (DEBT-050
Phase 2) starts emitting.

decnet/web/db/models/attackers.py — SessionProfile class deleted
(~135 lines), KD_PAUSE_*/KD_START_OF_ACTION_IDLE_S module constants
deleted, module docstring updated to point at the observations
table. AttackerIdentity.kd_digraph_simhash is KEPT — it's the v2
federation centroid hook, not a SessionProfile field; docstring
repointed to the BEHAVE primitive that will populate it.

decnet/web/db/sqlmodel_repo/attackers/sessions.py — DELETED.
SessionProfilesMixin dropped from the AttackersMixin MRO.

decnet/web/db/repository.py — abstract upsert_session_profile +
get_session_profile removed.

decnet/web/db/sqlite/repository.py + mysql/repository.py —
_migrate_session_profile_table helpers and their initialize() calls
removed. mysql initialize() now goes attackers → column_types →
admin (no session_profile step).

decnet/web/db/models/__init__.py — SessionProfile re-export gone.

decnet/web/db/models/attacker_intel.py — docstring cross-reference
to SessionProfile.schema_version retargeted to AttackerIdentity.

decnet/web/router/attackers/api_get_attacker_detail.py — adds
``observations: []`` to the response by calling
``repo.latest_observation_per_primitive(uuid)`` and projecting to a
list sorted by primitive path. Empty until the extractor lands;
shape matches BEHAVE-INTEGRATION.md §"AttackerDetail consumer".

tests/profiler/test_session_profile.py — DELETED (56 lines).
tests/db/test_base_repo.py — DummyRepo loses upsert_session_profile
and get_session_profile overrides.
tests/db/mysql/test_mysql_migration.py — initialize-call-order
assertion updated; session_profile step removed from the expected
sequence; docstring records why.
tests/ttp/test_lifter_absence.py — docstring "no SessionProfile" →
"no ObservationRow".
2026-05-03 07:33:37 -04:00
0972325527 feat(web/db): observations table + repo + bus prefix (BEHAVE-INTEGRATION Phase 1)
Additive Phase 1 of BEHAVE-INTEGRATION.md. Lays the storage layer
the BEHAVE-SHELL extractor (DEBT-050) will write into. Nothing
breaks; SessionProfile coexists for now and is dropped in the
follow-up commit.

decnet/web/db/models/observations.py — new ObservationRow SQLModel
mirroring the BEHAVE Observation envelope field-for-field
(core/decnet_behave_core/spec/envelope.py). ``id`` is a hex-string
UUID (matching BEHAVE), not a typed UUID column. ``identity_ref``
is str | None — written by the future attribution engine, NULL
until then. ``attacker_uuid`` is the one DECNET-side
denormalisation; FK'd to attackers.uuid for cheap AttackerDetail
joins. ``evidence_ref`` is NOT NULL for DECNET emissions even
though the upstream envelope makes it optional — the worker's
"already profiled?" check keys on it. UniqueConstraint(evidence_ref,
primitive) enforces idempotency at the schema level so re-running
the extractor on the same shard+sid produces a DB-side conflict
the upsert path resolves deterministically. Class is named
``ObservationRow`` (not ``Observation``) to avoid colliding with
the BEHAVE Pydantic envelope at sites that import both.

decnet/web/db/sqlmodel_repo/observations.py — ObservationsMixin.
Three public methods backing the canonical queries from
BEHAVE-INTEGRATION.md §"Storage": ``upsert_observation`` (idempotent
on the natural key), ``latest_observation_per_primitive`` (per-
primitive MAX(ts) subquery, portable across SQLite and MySQL — no
DISTINCT ON), ``observations_time_series`` (asc-by-ts). Plus
``has_observations_for_evidence`` for the worker's session-already-
profiled check.

decnet/bus/topics.py — ATTACKER_OBSERVATION_PREFIX = "observation"
constant + ``attacker_observation(primitive)`` builder. Full topic
shape ``attacker.observation.<primitive>`` matches what BEHAVE's
spec.event_adapter.event_topic_for produces upstream. Documentation
+ pattern matching only — bus auth is socket file perms (DEBT-029
§2), not topic-level.

decnet/web/db/repository.py — abstract ``upsert_observation``,
``latest_observation_per_primitive``, ``observations_time_series``
on BaseRepository.

tests/db/test_observations.py — 11 tests covering upsert round-trip,
idempotency under the unique constraint, latest-per-primitive
ordering across multiple sessions, time-series asc-ordering, empty-
attacker contract, every BEHAVE ValueKind round-tripping through
the JSON column, and the has_observations_for_evidence check.

tests/db/test_base_repo.py — DummyRepo gains the three new abstract
overrides so its coverage suite still instantiates.
2026-05-03 07:25:10 -04:00
11f474556c docs(behave): integration + extractor + attribution design (DEBT-050 / 051)
Three sibling design docs plus DEBT.md updates that supersede the
stale DEBT-036 with a BEHAVE-aligned plan.

development/BEHAVE-INTEGRATION.md — five-phase rollout: storage
(observations table mirroring the BEHAVE Observation envelope plus
one DECNET-side denorm; UniqueConstraint(evidence_ref, primitive)
enforcing idempotency); engine (in decnet/profiler/behave_shell/
sublibrary, no new daemon, not in BEHAVE — DECNET is the engine);
BEHAVE pin; worker wire; UI panel + per-attacker SSE route; live
smoke. Bus payload merges id/ts/v back in to preserve sensor
identifiers across the bus envelope.

development/BEHAVE-EXTRACTOR.md — engine route in eight phases
(A–H). Phase A locks the 6-primitive calibration grid; Phases B–G
expand horizontally; Phase H is the full Tier-A corpus + v0
release. v0 ships every shell-extractable primitive (37 of them);
Tier B is cross-session and lives in the attribution engine; Tier
C is network-domain (toolchain.*) and lives elsewhere.

development/ATTRIBUTION-ENGINE.md — sublibrary inside
decnet/correlation/ that consumes attacker.observation.* events
and emits attribution.profile.* derived state. Five-state machine
(unknown / stable / drifting / conflicted / multi_actor) with per-
ValueKind merge functions. v0 closes DEBT-051; v1 adds the real
clusterer; v2 federation gossip. The bright line forbidding
attribution to natural persons is lifted directly from BEHAVE's
envelope docstring.

development/DEBT.md — DEBT-036 marked STALE; DEBT-050 and
DEBT-051 entries added; summary table + open list updated.
2026-05-03 07:24:19 -04:00
3f080f601d feat(intel,ingester): mal_hash feed + observed_attachments table (DEBT-046)
New MalHashProvider sibling ABC (decnet/intel/base.py) since SHA-256
is a different keyspace from IntelProvider's IPs. MalwareBazaarProvider
mirrors FeodoProvider's bulk-feed shape: 24h refresh via _ensure_fresh
/ _refresh, in-memory set[str] of hex-lowercased hashes, set-membership
lookup. Auth-keyed via DECNET_MALWAREBAZAAR_AUTH_KEY; absent key
silent-no-ops the lane (single warning, no HTTP traffic).

Per-hash observations persist to a new observed_attachments table.
DECNET is a honeypot platform — every attachment hash an attacker
delivers is intel, regardless of whether anyone classified it. Verdict
is sticky: True never downgrades to False/None on subsequent
observations. Out of scope: API surface, federation export, retention.

Ingester _publish_email_received calls the provider for each attachment
sha256, sets mal_hash_match on the bus payload (omitted entirely when
the message had no attachments — keeps R0046's `is True` predicate
silent on hash-less mail, matching pre-paydown behavior), and upserts
the row regardless of provider availability.
2026-05-03 05:56:46 -04:00
03beff3840 feat(orchestrator): authoritative failure-count badge endpoint (DEBT-042)
New GET /api/v1/orchestrator/events/stats?since=1h&success=false&kind=...
backed by repo.count_orchestrator_failures(since_ts, kind), which
counts failed rows across both orchestrator_events and
orchestrator_emails since the cutoff.

Window parser accepts ^\d+[smhd]$, capped at 7d. Today only
success=false is accepted on this surface so the endpoint isn't
accidentally repurposed before the next consumer is properly
designed.

Orchestrator.tsx polls the endpoint on mount + every 30 s and
renders the authoritative DB-derived count instead of deriving from
the in-memory SSE buffer + one paginated page (which silently
excluded failures older than the local window).
2026-05-03 05:26:45 -04:00
866a76eccf test(web): scaffold vitest + RTL with Orchestrator seed suite (DEBT-043)
Wire vitest 4 + jsdom + @testing-library/{react,jest-dom,user-event}
+ @vitest/coverage-v8 through vite.config.ts (defineConfig from
vitest/config). src/test/setup.ts registers jest-dom matchers and
RTL cleanup. tsconfig.app.json picks up vitest/globals types.

Seed suite Orchestrator.test.tsx covers the three regressions
called out in DEBT-043: empty-state render, kind-filter toggling
triggers a scoped refetch, mocked stream callback prepends a row.
2026-05-03 05:20:01 -04:00
6c6f97e840 feat(prober,correlation): attacker fingerprint rotation detection (DEBT-032)
When the prober observes a NEW hash for an
(attacker_uuid, port, probe_type) triple it has seen before — VPS
rotation, SSH server rebuild, TLS cert swap — emit a derived
attacker.fingerprint_rotated event carrying both old and new hash.
Detection is a small library (decnet.correlation.fingerprint_rotation)
called inline from the prober at each of the three emit sites
(JARM/HASSH/TCPFP). No new daemon. New AttackerFingerprintState table
holds per-triple last-hash state; Attacker.rotation_count and
Attacker.last_rotation_at are stamped on every diff. Library is sync,
fully unit-tested via injected publish_fn / syslog_fn callbacks.
2026-05-03 05:12:51 -04:00
dcd558fd91 chore(infra): pin Docker base images by digest (DEBT-023)
All base images (debian:bookworm-slim, ubuntu:22.04, ubuntu:20.04,
rockylinux:9-minimal, centos:7, alpine:3.19, fedora:39,
kalilinux/kali-rolling, archlinux:latest, honeynet/conpot:latest)
now carry their resolved sha256 digest so 'docker pull' is
deterministic. :tag retained for human readability; @sha256 is what
Docker actually resolves. Refresh procedure documented at the top of
decnet/distros.py.
2026-05-03 04:38:39 -04:00
6e19d3a25a chore(bait): scaffold default seed dir with README
Empty directory tracked via .gitkeep so operators see it on first
clone; README documents the .eml/.json drop-in flow that the IMAP/POP3
compose fragments wire up by default.
2026-05-03 04:30:09 -04:00
b3a96a045f feat(mail): default email_seed → \$PROJROOT/bait/ when unset
When service_cfg["email_seed"] is absent, compose_fragment now falls
back to $PROJROOT/bait/ if that directory exists on the host. Lets
operators drop a deployment-wide bait corpus into one place without
threading email_seed through every decky's config. Missing dir keeps
old no-op behavior.
2026-05-03 04:25:24 -04:00
b88d67794d feat(mail): operator-tunable IMAP/POP3 email seed (DEBT-026)
IMAP_EMAIL_SEED / POP3_EMAIL_SEED accept a directory (rglob *.eml +
*.json) or a single .json/.eml. Loaded entries CONCATENATE with the
hardcoded _BAIT_EMAILS — additive to the realism-engine emailgen
output rather than replacing it. JSON dicts require from_addr /
to_addr / subject / body; bare bodies are wrapped into RFC 5322 on
load. compose_fragment reads service_cfg["email_seed"] and bind-mounts
the host path read-only at /var/spool/decnet-emails/seed.
2026-05-03 02:47:06 -04:00
e0b07651fd docs(debt): mark DEBT-047 resolved (EmailLifter disk-reach + ttp agent gate) 2026-05-02 20:07:54 -04:00
79674026dd feat(cli): allow decnet ttp on agents (DEBT-047)
The TTP-tagging worker is now safe to run on agent hosts: EmailLifter
disk-reaches body-aware predicates from the local artifacts tree
(DEBT-035 unblocked filesystem access; DEBT-047 added the helper).

Drop `ttp` from MASTER_ONLY_COMMANDS in cli/gating.py and remove the
defence-in-depth `_require_master_mode("ttp")` call in cli/ttp.py.
`ttp-backfill` walks the master DB and stays master-only.
2026-05-02 20:07:03 -04:00
e972d870de feat(ttp): EmailLifter disk-reach for body-aware predicates (DEBT-047)
R0047 (BEC) and the encoded-payload predicate substring-match against
the email body. Shipping raw body text on the abstracted service bus
is the wrong privacy stance — the bus transport may swap from UNIX
socket to networked at any time, and "loopback today" is not a license
to put PII on the wire.

EmailLifter now opens the .eml lazily from
/var/lib/decnet/artifacts/{decky_id}/smtp/{stored_as} when a body-aware
predicate runs and parses the body in-process via stdlib email +
policy.default. The decoded body is memoized into the payload dict so
multiple body-aware predicates on the same event open the file once.

Bus envelope only carries the artifact pointer (decky_id + stored_as);
raw body bytes never cross the host disk boundary on the agent → master
hop. Filesystem access on agents is unblocked by DEBT-035 (setgid +
group-readable artifacts root, paid 2026-05-02).

The legacy inline body_text path is preserved — when the producer ships
body_text on the bus the helper short-circuits without opening the file.
2026-05-02 20:05:54 -04:00
7036a86e76 refactor(artifacts): extract resolve_artifact_path to shared module
Move artifact path validation + symlink-escape check out of the
admin-gated download endpoint into decnet/artifacts/paths.py so the
TTP EmailLifter can disk-reach .eml files at tag-time without
duplicating regex/root logic (DEBT-047).

The router now catches ArtifactPathError and re-raises HTTPException(400);
behavior is unchanged.
2026-05-02 20:02:47 -04:00
cdbb3d3571 fix(ssh,telnet): move PROMPT_COMMAND out of /root/.bashrc + pin readonly
ANTI flagged two regressions in the existing command-event capture:

1. **Tell**: PROMPT_COMMAND lived in /root/.bashrc, the FIRST file
   an attacker greps after landing root. The logger invocation
   sitting there is plain-text honeypot signage.
2. **Bypass**: even when missed, `export PROMPT_COMMAND=""` silently
   disables capture. ANTI personally bypasses this on engagements.

Reshape:

* Move the assignment to **/etc/environment** — read by pam_env at
  session open (sshd via /etc/pam.d/sshd, telnet via
  /etc/pam.d/login), before any shell rc file fires. Far less
  obvious than .bashrc; a casual `cat .bashrc` no longer surfaces
  the capture.
* Define the helper as a function `__bash_history_sync` in
  **/etc/bash.bashrc** (system-wide bashrc, sourced by every
  interactive bash). Function name reads as generic bash
  housekeeping; no DECNET branding in the symbol.
* Pin both the function and PROMPT_COMMAND **readonly** so
  `export PROMPT_COMMAND=""` fails with "readonly variable"
  instead of silently winning. Mitigation, not airtight —
  `bash --norc` still bypasses — but the passive `export`
  bypass is closed.

The actual `logger --rfc5424 --msgid command ... CMD ...` invocation
is preserved exactly; only its location and the readonly guard
change. R0001–R0030 (command-rule pack) consume the same syslog
shape as before.

Three new tests assert: the value lands in /etc/environment, the
function body lives in /etc/bash.bashrc, no PROMPT_COMMAND line
remains in /root/.bashrc, and `readonly PROMPT_COMMAND` /
`readonly -f __bash_history_sync` are both present. Mirror
assertions added on the Telnet Dockerfile via
test_config_schema.py.
2026-05-02 19:50:24 -04:00
3e9c4c29b9 feat(ssh,telnet): add non-root user account for privesc + enum lure
Real Linux deployments (especially Ubuntu cloud images) ship a non-
root admin user; honeypots that only accept root logins are a tell.
Add a second account on both SSH and Telnet decoys, configurable
via service_cfg keys `user` / `user_password`, defaulting to
`ubuntu` / `admin` so the lure is live on every fresh deploy.

* `decnet/services/{ssh,telnet}.py` — two new ServiceConfigFields
  (`user` string, `user_password` secret) and matching env vars
  (`SSH_USER` / `SSH_USER_PASSWORD`, mirror for telnet) propagated
  via the compose fragment.
* `decnet/templates/ssh/entrypoint.sh` — runtime `useradd -m -s
  /usr/libexec/login-session -G sudo "$SSH_USER"` so the new user
  inherits the same sessrec pty-recording shell as root and lands
  in the sudo group. Privesc attempts (`sudo`) flow through the
  existing sudo-log capture; network-enum from the user's shell
  rides the recorded transcript.
* `decnet/templates/telnet/entrypoint.sh` — same useradd pattern
  (no sudo group — busybox+login telnet image has no sudo
  package; privesc rides `su -` which itself flows through the
  existing PAM auth-helper at /etc/pam.d/login).
* New tests for default + custom user / password + independence
  from root password. Updated the schema-keys assertion to match
  the four-field shape.

The new account is ALSO the natural home for the body-aware
predicates that were previously gated on root-only sessions —
attackers who land on `ubuntu@host` and run network-recon /
privesc commands now generate the same structured TTP-rule
events as root sessions did, captured via the same auth-helper
+ sessrec + sudo-log pipes.
2026-05-02 19:48:03 -04:00
c675bd26cf docs(debt): mark DEBT-035 resolved; lift DEBT-047 filesystem-access blocker
DEBT-035 (artifacts written as the container uid, not the API's) is
resolved by the two preceding commits:
* 39a298f6 — persists DECNET-service api-user/api-group as names in
  decnet.ini for any future composer / worker that wants to resolve
  the local uid via pwd.getpwnam.
* b2733216 — creates /var/lib/decnet/artifacts at init time with mode
  0o2775 (setgid + group-write) owned by the DECNET-service
  user:group.

The setgid bit is the load-bearing fix: Linux mkdir(2) propagates a
parent's group AND its setgid bit to every new subdirectory. Docker
auto-creates the per-decoy / per-service subtree as bind-mounts fire,
so those subdirs come up with group=decnet and setgid set; container
file writes (default umask 0o022 → mode 0o644) inherit the decnet
group; the API process and the local TTP worker (both running as the
DECNET-service user, primary group decnet) read via group-read.

The original recommendation of compose `user:` injection turned out
infeasible for SSH and Telnet — PAM's setuid(2) during login
fundamentally cannot run from a non-root container. Setgid covers
both root-internal and unprivileged-internal templates uniformly
without requiring per-template carve-outs.

DEBT-047 (R0047 BEC disk-reach) was gated on DEBT-035 for filesystem
access. That blocker is lifted — `decnet ttp` running on agents as
the local DECNET-service user can now read .eml files written by
the SMTP decoy. The remaining DEBT-047 work is the master-only gate
flip in decnet/cli/gating.py and the EmailLifter disk-reach helper
itself (factor _resolve_artifact_path out of the artifacts API
endpoint into a shared module).

Soft-fail paths in api_get_transcript.py and api_get_artifact.py
stay as defence-in-depth — option 2 should make them never fire on
a healthy install but a misconfigured deploy must not 500 the API.
2026-05-02 19:40:12 -04:00
b27332169d feat(init): create /var/lib/decnet/artifacts with setgid + group-write
DEBT-035 step 2. Today the artifacts subtree is auto-created by
Docker as root when a decoy container's bind-mount fires for the
first time. The resulting permissions are root:root 0o755 — the API
process (running as the decnet user) hits PermissionError trying to
read transcripts written by the container, and the soft-fail 404
path gets exercised on every fresh deploy.

Add `/var/lib/decnet/artifacts` to init's dirs list with mode 0o2775:

* 0o2000 — setgid bit. New files inherit the directory's group
  (decnet), regardless of which uid created them. This is the load-
  bearing bit for cross-container reads.
* 0o0775 — owner+group rwx, world rx. Group-write lets the API
  process and the local TTP worker read each other's outputs
  without a manual chown.

`_ensure_dir` already respects the full mode word via `os.chmod`,
no helper change needed.

Test asserts the resulting directory carries exactly 0o2775 after
a fresh `decnet init --prefix`. Defence-in-depth: this works even
if the per-decoy compose `user:` directive (next commit) misses a
template — files still land in the decnet group.
2026-05-02 19:35:20 -04:00
39a298f685 feat(init): persist DECNET-service api-user/api-group to decnet.ini
DEBT-035 step 1. The composer needs to know which uid/gid to inject
into each compose fragment's `user:` directive at deploy time. Today
the resolved `--user` / `--group` values reach systemd unit
rendering (init.py:349–354) but are not persisted anywhere the
composer can read them.

Persist as **names** (not numeric ids) under `[decnet] api-user` /
`api-group` in the rendered decnet.ini placeholder. Resolution to
uid/gid happens at deploy time on whichever host runs the deploy,
via `pwd.getpwnam(...)` / `grp.getgrnam(...)` — so the same user
name can have different uids on master vs agents (heterogeneous
/etc/passwd) without breaking artifact ownership. The existing
config_ini auto-translates kebab→DECNET_API_USER / DECNET_API_GROUP
at load time; no domain-map changes needed.

Two new tests: one asserting the rendered ini carries the
`api-user` / `api-group` keys for the values passed to `--user` /
`--group`; one round-tripping through `load_ini_config` to confirm
the env vars land in `os.environ` for the composer to pick up.
2026-05-02 19:33:53 -04:00
b3ea3fa925 docs(debt): merge rogue root DEBT.md into the canonical development/DEBT.md
A previous agent (and several of my own commits) wrote to a top-level
DEBT.md without seeing the existing development/DEBT.md — the
canonical register since DEBT-001. Resulted in two parallel files,
inconsistent numbering schemes, and references that resolved to the
wrong place.

Migrate the six entries that landed in the rogue file into the
canonical register as DEBT-044 through DEBT-049, preserving their
status (resolved / partial / open) and cross-references. The
TTP_TAGGING.md references to "DEBT.md" already resolve to
development/DEBT.md by virtue of being in the same directory; only
the comment in decnet/ttp/impl/intel_lifter.py needed disambiguation
to "development/DEBT.md DEBT-048".

* DEBT-044 — `attacker.email.received` producer wiring ( RESOLVED 2026-05-02)
* DEBT-045 — EmailLifter heavyweight feature extraction (PARTIAL PAID 2026-05-02)
* DEBT-046 — EmailLifter mal-hash feed integration (open)
* DEBT-047 — EmailLifter R0047 BEC unblock (open, gated on DEBT-035)
* DEBT-048 — TTP intel provider mapping review (recurring quarterly)
* DEBT-049 — TTP Sigma adapter — post-v1 (open)

Summary table extended; "Remaining open" line updated; root file
removed. The DEBT-047 entry now explicitly cross-references DEBT-035
as the gating dependency for the R0047 BEC unblock.
2026-05-02 19:17:20 -04:00
17367d0a69 docs(debt,ttp): retire shipped lanes; file mal-hash-feed and R0047-disk-reach entries
Mark the EmailLifter heavyweight follow-up as PARTIAL PAID — R0042 /
R0046 (macro / password / smuggling lanes) / R0048 fire end-to-end
after commits 291b78c1 (decky extractors) and the ingester producer
projection that follows.

Two narrower DEBT entries replace the lanes that remain gated:

* "EmailLifter mal-hash feed integration" — R0046's mal_hash_match
  lane needs a curated bad-hash feed (MalwareBazaar SHA-256 dump as
  the v0 candidate, mirroring the FeodoProvider bulk-feed pattern at
  decnet/intel/feodo.py). Feed integration, not extraction. Lifter
  predicate already reads `payload.get("mal_hash_match")` — silent
  today only because the field is absent.
* "EmailLifter R0047 BEC — unblock when artifact disk-reach lands"
  cross-references the agent UID/GID DEBT entry that blocks
  `decnet ttp` from reading artifacts written by deckies on the
  same host. Disk-reach is the intended solution; raw body_text on
  the bus is rejected because the bus transport is abstracted (the
  UNIX-socket implementation may swap to networked at any time, and
  privacy decisions must hold regardless of transport).

Append to TTP_TAGGING.md §"Producer wiring": the email.received
producer pointer (was "none — DEBT"), the full per-message payload
shape with the new heavyweight fields, and an explanatory block on
why the bus is body-text-free + how R0047 / R0048 each handle their
body dependency (R0048 via the precomputed scalar; R0047 deferred).
2026-05-02 19:12:30 -04:00
c714941069 feat(bus): project EmailLifter heavyweight fields onto email.received
The decky's Layer-2 extension (commit 291b78c1) emits body_simhash /
body_base64_bytes / html_smuggling on the message_stored log and adds
macro_indicator / encrypted booleans to each attachments_json
manifest entry. Lift them all onto the email.received bus payload:

* body_simhash — passes through as-is (16 hex chars or "")
* body_base64_bytes — coerced to int (0 on absent / malformed)
* attachment_macros / attachment_password_protected — OR-reduced
  across the per-attachment manifest booleans; matches R0046's
  matched_trigger semantics where a single positive lane fires the
  rule
* html_smuggling — coerced bool from the decky's 0/1 int

Pre-Layer-2 message_stored events (older deckies, malformed log
rows) project to safe defaults: empty simhash, zero base64-bytes,
all booleans False — the EmailLifter then stays silent, never
fires a false positive on missing data.

R0042 (mass-phish) / R0046 macro / R0046 password / R0046 smuggling
/ R0048 (encoded payload) all fire end-to-end after this commit.
R0046 mal_hash_match and R0047 BEC remain deferred per their
respective DEBT entries (filed in the next commit).
2026-05-02 19:10:30 -04:00
291b78c1d0 feat(smtp): extract body_simhash + base64-bytes + html-smuggling + per-attachment macro/encrypted
Heavyweight Layer-2 extractors land alongside the cheap projections
shipped in commit e9324aca, so the EmailLifter R0042 / R0046 (macros
/ password / smuggling lanes) / R0048 fire from the bus payload
without the lifter having to reach back to disk.

Extractors:
* body_simhash — inlined 64-bit Charikar simhash (md5-keyed,
  frequency-weighted) over word tokens of the union of text/* body
  parts. Inlined rather than pulling the `simhash` PyPI dep, which
  transitively brings numpy ~50 MB into a slim decky container; the
  algorithm is ~15 lines and identical in extraction quality.
* body_base64_bytes — largest decoded base64 chunk's byte count,
  scanning text body parts with the same `_BASE64_RE` the lifter's
  `_p_encoded_payload` fallback uses. R0048 fires from this scalar
  alone; the lifter's body_text fallback becomes dead in normal
  operation.
* attachment_macro_indicator — stdlib zipfile sniff for
  `vbaProject.bin` inside OOXML containers. Catches modern .docm /
  .xlsm / .pptm and macro-injected .docx; legacy .xls (CFBF) is a
  follow-up.
* attachment_encrypted — flag_bits & 0x01 on any ZIP / OOXML entry's
  central directory; magic-byte match for 7z / RAR / CFBF (encrypted
  Office wrap).
* html_smuggling — structural lxml parse first: fires when an `<a
  download>` element coexists with a `<script>` referencing
  `Blob` / `Uint8Array` / `URL.createObjectURL`. Regex pair-check
  fallback on lxml parse failure (real-world phish HTML is often
  malformed). Cuts the FP rate that pure-regex would produce on
  legitimate "click to download" links.

Add `python3-lxml` (~5 MB Debian package, C-extension, no transitive
Python deps) to the SMTP decky's Dockerfile. simhash stays inline.
Per the dependency rule: lxml earns its weight by cutting R0046's
OR-combined FP rate; a heavier macro-detection lib (oletools ~5 MB
pure-python with msoffcrypto) would not measurably improve the
boolean signal we need, so stdlib stays for that lane.
2026-05-02 19:08:37 -04:00
fb85762703 feat(bus): publish email.received from ingester after SMTP artifact persist
Wires the EmailLifter (R0041–R0048) producer that DEBT.md item #3
deferred. After the existing add_bounty() call in _extract_bounty
(line 615), call _publish_email_received() which:

* resolves the attacker_uuid via repo.get_attacker_uuid_by_ip; drops
  the publish if unresolved (the TTP worker can't anchor orphan
  events)
* projects the message_stored fields onto the EmailLifter wire
  contract: from_domain / mail_from_domain / return_path_domain
  parsed via _domain_of, rcpt_count + rcpt_domains via
  _rcpt_projection, attachment_sha256s + attachment_extensions
  derived from the existing attachments_json manifest, urls from
  urls_json, dkim_signed/spf_pass coerced from 0/1 ints to bool
* mirrors _publish_probe_pending's bus-per-call pattern and
  swallows all exceptions (the bus is the notification layer, not
  the source of truth)

Fires for both relay and non-relay SMTP services. R0041 / R0043 /
R0044 / R0045 are now live end-to-end; R0046 partial (extension
lane). Heavyweight predicates (R0042 simhash, R0046-deep, R0047 /
R0048 body_text) stay deferred per the EmailLifter heavyweight
DEBT entry.
2026-05-02 18:39:13 -04:00
e9324acac7 feat(smtp): emit X-Mailer / Return-Path / dkim+spf / URLs on message_stored
The EmailLifter (R0041–R0048) keys on header-derived signals that the
v0 _summarize_message did not extract. Add cheap Layer 2 projections
inside the existing single-pass parse:

* return_path / x_mailer — direct header reads, decoded RFC 2047
* dkim_signed / spf_pass — booleans derived from any
  Authentication-Results header (multiple lines tolerated; positive
  verdict on any line wins)
* urls — http(s) URLs lifted from text/* body parts via a tight
  regex, deduplicated first-seen-wins, capped at 64 in the wire
  payload to bound the syslog SD value

Heavyweight extraction (body simhash, office-macro detection,
HTML-smuggling, password-protected archives, mal-hash-match,
body_text projection) stays deferred per the EmailLifter heavyweight
DEBT entry — those rules need privacy / extractor decisions before
they ship.
2026-05-02 18:37:11 -04:00
2ce150a53e docs(debt): mark email.received producer as paid; file heavyweight follow-up
The 2026-05-02 paydown wires the producer at ingester.py after
add_bounty(), with the cheap projections (domains, rcpt_count,
attachment_count, x_mailer, dkim/spf, attachment shas + extensions,
URLs). R0041 / R0043 / R0044 / R0045 fire end-to-end after this PR;
R0046 partial.

The remaining lanes (R0042 body_simhash, R0046 macro / smuggling /
password / mal_hash, R0047 / R0048 body_text projection) are filed
as a new entry "EmailLifter heavyweight feature extraction" with the
field map and the privacy-vs-completeness fork on body_text called
out for the next maintainer to pick a side.
2026-05-02 18:24:51 -04:00
9a7d116351 docs(ttp): sync A.10 + rewrite §9 drift runbook + DEBT.md markers
Appendix A.10 corrected to match the post-2026-05-02-audit reality:
AbuseIPDB cat 7/13/16/17 land on their canonical AbuseIPDB names
(Phishing / VPN IP / SQL Injection / Spoofing); cats 4 and 10 carry
explicit "drop" annotations so the next reviewer sees the intent
rather than guessing. ThreatFox table re-keys on `threat_type` (the
canonical taxonomy field) and adds the `payload` and `cc_skimming`
rows. GreyNoise table promotes bare-malicious to a half-multiplier
emission of T1071.

§"Hard parts §9 Intel provider drift" replaces the prose handwave
with a runnable check: provider URLs, the ThreatFox curl invocation
that needs DECNET_THREATFOX_API_KEY, the rule_version + emits +
attack_catalog co-evolution rules, and the full chain of files to
exercise. Adds a "Ship-time audit log" subsection so future quarterly
runs have a known-good baseline to diff against.

DEBT.md item #1 records LAST_REVIEWED: 2026-05-02 / NEXT_REVIEW:
2026-08-02 and points at §9 for the runbook. DEBT.md item #3 (the
attacker.email.received producer) flags its gating premise as
potentially stale — ANTI noted SMTP honeypots already persist
received messages, contradicting the "no source row" claim that
deferred the wiring.
2026-05-02 18:09:20 -04:00
f8dee596e5 fix(ttp): expand R0054/R0055/R0057 emits + LAST_REVIEWED markers
The IntelLifter's _emit_filtered fans out only the rule.emits entries
whose technique_id appears in the predicate's decision set. v1's emits
lists were narrow supersets of the common case, silently dropping the
rest of the predicate's possible emissions:

  R0054 dropped: T1046 (cat 14), T1078 (cat 20), T1090 (cats 9/13),
                 T1496 (cat 11), T1595 (cats 14/19)
  R0055 dropped: T1090 (tor_exit_node), T1110 (ssh_bruteforcer),
                 T1588 (the second emit of every C2-framework tag)
  R0057 dropped: T1105 (payload_delivery, download_url)

Bump rule_version 1->2 on R0054/R0055/R0057, expand emits to cover
every technique the predicate produces. R0056 (Feodo) and R0058
(aggregate bump) carry no enum and stay at v1.

All five YAMLs gain `last_reviewed: "2026-05-02"` and
`next_review: "2026-08-02"` markers; the rule YAML is now the
canonical record of when the mapping was last reconciled against
upstream, with DEBT.md as the calendar reminder.
2026-05-02 18:09:03 -04:00
75ff0ede1f fix(ttp): correct intel_lifter mappings + repoint ThreatFox to threat_type
Three bug classes uncovered by the 2026-05-02 ship-time audit:

* AbuseIPDB code/name mismatch in v1: cat 10 was treated as DDoS (it's
  Web Spam — DDoS is cat 4, intentionally unmapped per A.10) and cat 17
  as VPN IP (it's Spoofing — VPN IP is cat 13). Both typos mirrored in
  code AND the design doc Appendix A.10. Code now matches the AbuseIPDB
  taxonomy exactly; cat 17 retargets to T1566 (email-spoofing as a
  phishing precursor), and cats 7 (Phishing) and 16 (SQL Injection)
  pick up T1566 / T1190 emissions that v1 didn't cover.

* ThreatFox dispatch keyed on `ioc_type` in v1, but `ioc_type` is the
  indicator format (url / domain / hash variants) and carries no ATT&CK
  signal. The canonical taxonomy field per ThreatFox's API is
  `threat_type` (botnet_cc / payload_delivery / payload / cc_skimming).
  Repoint dispatch through the new `threatfox_threat_types` payload
  field; `ioc_type` rides as evidence only. Also adds the missing
  cc_skimming -> T1056 (Input Capture) mapping and registers T1056 in
  attack_catalog.py.

* GreyNoise bare-malicious lane: a `classification == "malicious"` row
  with no recognised tag used to emit nothing. Now lights T1071 at a
  half multiplier, suppressed when a tag already fires T1071 to avoid
  double-stamping at conflicting confidence levels.
2026-05-02 18:08:48 -04:00
a31ad82880 feat(intel): project per-provider taxonomy into attacker.intel.enriched payload
The TTP worker forwards the bus payload verbatim to the IntelLifter as
TaggerEvent.payload. The pre-audit publish payload only carried
{attacker_uuid, attacker_ip, aggregate_verdict, providers}, so even with
the new AttackerIntel taxonomy columns populated the lifter still saw
nothing. Lift the relevant fields (categories / tags / threat_types /
malware family / score / classification) into the bus event and decode
JSON-string list columns back to native lists at the boundary.
2026-05-02 18:08:29 -04:00
999d3494b4 feat(intel): persist per-provider taxonomy on AttackerIntel for TTP dispatch
The 2026-05-02 ship-time audit of the R0054-R0058 intel rule pack found
that AbuseIPDB / GreyNoise / ThreatFox stored only the aggregate verdict
(score / classification / listed-bool) plus the raw response blob. The
TTP IntelLifter expects per-provider taxonomy fields (categories, tags,
threat_types) that were never populated, so R0054 / R0055 / R0057
emitted zero tags in production despite passing unit tests.

Add typed columns: abuseipdb_categories, greynoise_tags, greynoise_name,
feodo_malware_family, threatfox_threat_types, threatfox_ioc_types,
threatfox_malware_families. Each provider now parses the relevant
taxonomy out of the upstream response and writes it through
column_updates. JSON-list columns ride as TEXT with default "[]" to
keep the SQLite/MySQL backend split honest, deserialised back to native
lists by the repo on read.
2026-05-02 18:07:57 -04:00
d1c4a48963 feat(ttp): split bash CMD evidence into structured uid/user/src/pwd/cmd rows
The inspector was dumping the whole `CMD uid=0 user=root src=… pwd=…
cmd=nmap -p- 192.168.1.0/24` syslog body into a single ``command_text``
blob. ANTI: "I'd like to separate the fields." Done — three layers
work together:

1. Collector session aggregator: new `_parse_cmd_msg` splits the bash
   PROMPT_COMMAND msg into `{uid, user, src, pwd, command}`. The
   session-ended envelope's per-command dict now carries the
   structured fields, with `command_text` set to just the cmd= value
   (preserving embedded whitespace — `nmap -p- 1.2.3.0/24` etc.).

2. Rule engine: per-source_kind auxiliary evidence list
   (`_AUX_EVIDENCE_FIELDS`). For `command` events the engine
   automatically promotes uid/user/src/pwd into the persisted
   `evidence` dict on top of the rule's explicit `evidence_fields`.
   Engine-controlled, not per-rule — adding a new aux field is one
   line here, not a 30-rule YAML sweep, and rule authors can't
   accidentally drop it.

3. TTPInspector frontend: evidence renders as a structured
   `kvs` grid (UID / USER / SRC / PWD / CMD rows) instead of
   pretty-printed JSON. Primary-order list keeps shell fields at
   the top; everything else falls below alphabetically so unfamiliar
   evidence shapes still surface predictably.

Tests:
- session_aggregator pins the structured-fields emit (uid/user/src/
  pwd/command_text without "CMD" prefix, embedded whitespace
  preserved).
- rule_engine_tagger pins the aux-field auto-promotion + the
  no-`None`-leakage path when payload doesn't carry an aux key.
2026-05-02 03:20:53 -04:00
84699f89da feat(ttp): show canonical ATT&CK technique names in the TTPs UI
"T1595" alone is opaque; "T1595 — Active Scanning" tells you the
story at a glance. The names come from a backend-side static catalogue
pinned to the same ATT&CK release as the rule engine
(_ATTACK_RELEASE = "v15.1") — names are the canonical MITRE labels,
not author-supplied strings on rules, so a rule author can't typo a
name and the entire fleet sees the typo.

- New `decnet/ttp/attack_catalog.py` with `TECHNIQUE_NAMES` covering
  every technique_id + sub_technique_id emitted by `rules/ttp/`
  (R0001..R0058 → 69 IDs in the v0 pack).
- `IdentityTechniqueRow` / `TechniqueRollupRow` / `CampaignTechniqueRow`
  / `TTPTagDetailRow` gain optional `technique_name` /
  `sub_technique_name` fields. Repo + router populate them from the
  catalogue at row-construction time. None when an ID isn't in the
  catalogue — UI falls back to the bare ID.
- Coverage test (`tests/ttp/test_attack_catalog.py`) walks every
  YAML rule and asserts every emitted ID has a catalogue entry, so
  a future rule author who forgets to update the catalogue gets a
  loud failure rather than a silent UI fallback.

Frontend:
- `TTPsObservedSection` shows "T1595.002 — Active Scanning:
  Vulnerability Scanning" instead of just the ID, with overflow
  ellipsis + tooltip for narrow viewports. Inspector header /
  TECHNIQUE row also surface the names.
2026-05-02 03:10:07 -04:00
42e9492118 feat(ttp): inspector drawer surfaces evidence + rule_id behind each technique
The TTPsObservedSection rollup tells the operator "we saw T1059" but
not why. Click any technique row → side drawer opens listing every
ttp_tag row in scope with the persisted evidence JSON, firing
rule_id / rule_version, source_kind / source_id, confidence, and
created_at. Mirrors the CredentialReuseInspector / BountyInspector
pattern (drawer-backdrop + bd-head/bd-body + kvs grid).

Backend:
- New `GET /api/v1/ttp/tags/by-{scope}/{uuid}/{technique_id}`
  (`scope ∈ {identity, attacker, session}`, optional
  `?sub_technique_id=`, `?limit=` capped to 1000). Returns raw
  TTPTag rows newest-first.
- New `TTPTagDetailRow` Pydantic model + re-export.
- New repo method `list_tags_by_scope_and_technique` on
  TTPMixin (+ abstract on BaseRepository) — single query branched
  on scope; identity scope projects through `Attacker.identity_id`
  the same way `list_techniques_by_identity` does.
- Tests: evidence round-trips, sub_technique filter, JWT-required,
  empty scope, unknown scope rejected.

Frontend:
- New `TTPInspector.tsx` + `TTPInspector.css` (violet accent, slide
  animation, focus-trapped panel matching the existing inspector
  family).
- `TTPsObservedSection`'s TechniqueBar is now click+keyboard
  activatable; clicking opens the inspector for that
  (technique, sub_technique) tuple.

mypy clean. 532 passed in the targeted sweep.
2026-05-02 02:55:05 -04:00
c4e29e3bf9 fix(ttp): resolve attacker_uuid from attacker_ip on bus-event consume
The collector's `attacker.session.ended` envelope carries
`attacker_uuid: null` and `attacker_ip: <ip>` because the collector
doesn't talk to the DB. The TTP worker passed that null straight
through, and `TTPTag.__init__` raised the documented invariant:

    ValueError: ttp_tag requires at least one of attacker_uuid /
                identity_uuid; both NULL is not a valid anchor.

The worker now resolves `attacker_uuid` from `attacker_ip` via
`BaseRepository.get_attacker_uuid_by_ip` before fanning out the
event. When the IP isn't in the DB yet (profiler hasn't ingested
the row), the event is dropped with one log line — better than
exploding mid-tag.

- New `get_attacker_uuid_by_ip(ip) -> str | None` on the repo
  (BaseRepository abstract + AttackersCoreMixin impl).
- `_resolve_attacker_uuid` helper in `decnet/ttp/worker.py` runs
  before `_build_events`. Short-circuits when the payload already
  has either anchor; drops the event when neither anchor is
  resolvable.
- Tests pin: short-circuit on existing uuid/identity, repo lookup,
  drop on unknown IP, drop on "Unknown" sentinel, drop on
  no-anchor payload, drop on repo failure.
2026-05-02 02:44:30 -04:00
f9901befc4 docs(ttp): catalogue producer wiring for every TTP-watched topic
Add a "Producer wiring" subsection under TTP_TAGGING.md §"Bus
topics" mapping every topic the TTP worker subscribes to onto the
file:line that publishes it. Calls out the gap (`email.received`
has no producer today) and the new `attacker.session.ended`
payload shape from the collector aggregator.

Also lists the four producer regression tests added in this series
so a future contributor sees the safety net before staring at the
silent rule engine.

DEBT.md gets the `attacker.email.received` follow-up entry — wire
the producer when SMTP-receive persistence lands, since today the
honeypot relay path doesn't store received emails anywhere a
publisher could read from.
2026-05-02 02:39:23 -04:00
b5ce236cab test(bus): pin scope-(2) producer wiring for reuse / clusterer / intel
Three producer-side regression guards. Each drives the worker's run
loop with a fake bus + stubbed repo and asserts the documented topic
fires when the producer has data:

- reuse correlator → credential.reuse.detected (one finding row)
- clusterer → identity.formed + identity.merged (one ClusterResult)
- intel worker → attacker.intel.enriched (one unenriched attacker
  + a fake provider returning a "malicious" verdict)

These complement commit 1's attacker.session.ended producer test —
together the four cover every TTP-relevant publisher in the tree
(modulo email.received, which has no producer yet; tracked in
DEBT.md).
2026-05-02 02:38:24 -04:00
b043c96d29 feat(collector): publish attacker.session.ended on session_recorded events
The TTP worker subscribes to attacker.session.ended but no upstream
component published it — the rule pack (R0001–R0030) therefore never
fired on live SSH traffic even after the consume-side wiring landed
in E.3.18a/b/c.

The collector now hosts a per-attacker_ip command index
(_SessionAggregator) that watches the same parsed-event stream as
_publish_log. Shell `command` events are appended to a per-IP list;
on `session_recorded` the aggregator slices the list to commands
inside the [ended_at - duration_s, ended_at] window and publishes
attacker.session.ended with the session metadata + commands list.
The TTP worker's _build_events fan-out (E.3.18b) turns each command
into a source_kind="command" TaggerEvent that the RuleEngineTagger
(E.3.18c) matches against R0001–R0030.

Memory bound: per-IP entries TTL-evict at DECNET_COLLECTOR_SESSION_AGG_TTL_SEC
(default 3600 s). Publish failures are swallowed in the aggregator —
a misbehaving bus cannot stall the per-container stream threads.
2026-05-02 02:35:08 -04:00
d9d2a80573 fix(collector): unwrap double-wrapped RFC5424 around bash PROMPT_COMMAND
Honeypot SSH containers run `PROMPT_COMMAND` that calls
`logger --rfc5424 --msgid command -t bash "CMD …"`. The Docker-stdout
reader prepends an outer RFC5424 envelope (HOSTNAME=<decky>,
APP-NAME=1, MSGID=NIL) around that inner syslog line. Both the
collector parser (`parse_rfc5424`) and the correlation parser
(`parse_line`) saw the outer NIL MSGID and emitted `event_type="-"`
for every shell command — which:
  - kept `Attacker.commands` rows missing `command_text`
  - left R0001–R0030 (the pattern rule pack that matches shell
    commands) with no haystack
  - made `decnet.collector.log` show `event written … type=-`
    for the very lines that should be `type=command`

Both parsers now detect the inner-RFC5424 shape (`<TS> <HOST> <APP>
<PROCID> <MSGID> <rest>`) when the outer MSGID is NIL and the SD-arm
is also NIL, and re-extract HOSTNAME / APP-NAME / MSGID / remainder
from the body. The collector parser also recovers the post-SD msg
tail when the SD block isn't `relay@55555` (the bash CMD line carries
a `[timeQuality …]` block) so the kv-fallback can find `src_ip`.

Mirroring tests in tests/collector and tests/correlation pin both
the unwrap and the regression guard for non-double-wrapped lines.
2026-05-02 02:32:21 -04:00
e08bfc4a73 fix(ttp): /api/v1/ttp/rules returns the live rule catalogue
The endpoint was a contract-phase stub returning `[]` even though the
RuleStore loaded all 58 YAML rules at worker startup. UI saw an empty
table; operators couldn't tell whether anything was wired up.

- `api_list_rules` now calls `get_rule_store().load_compiled()` and
  serializes each CompiledRule + its operational state into a
  RuleCatalogueRow. Sorted by rule_id for stable golden snapshots.
- Add `description: str` to RuleSchema (pydantic) and CompiledRule
  (NamedTuple, defaulted) + propagate through `_compile_one` so the
  catalogue surfaces the human-readable YAML description, not just
  the slug-style `name`.
- Update `tests/ttp/test_rule_engine.py` _fields assertion for the
  new column; new `tests/api/ttp/test_rules_catalogue.py` pins the
  catalogue contents (R0001/R0014 presence, row shape, sort order).

Worker behaviour is unchanged: it was already loading rules
correctly. This is purely a read-side wiring fix on the operator API.
2026-05-02 01:54:06 -04:00
7ab0df3680 chore(cleaning): deleted swp vimfile 2026-05-02 01:39:17 -04:00
ca1e04033c docs(ttp): E.5 verification log appended to TTP_TAGGING.md
Closes the CDD design phase. Records:
- §E.1 contract inventory (every file exists, compileall clean).
- Targeted pytest pass: 604 passed, 1 skipped, 10 xfailed
  (all xfails are `xfail(strict=True)` with reason= pointing to the
  impl step that flips them; carry-overs, not flakes).
- Strict mypy over decnet/ttp + decnet/cli/ttp.py +
  decnet/web/router/ttp + decnet/web/db/sqlmodel_repo/ttp.py: clean.
- Stranger-readability spot check on tests/ttp/: no doc bugs.

Notes the three pre-E.4 wiring fixes (E.3.18a/b/c) and the E.4
backfill CLI / DEBT entries that landed in this series.
2026-05-02 01:37:45 -04:00
7d1f048764 docs(ttp): E.4.b/E.4.c DEBT entries — provider review + Sigma deferral
Quarterly TTP provider mapping review for AbuseIPDB / GreyNoise /
abuse.ch (Feodo Tracker, ThreatFox) catalogue drift against
`rules/ttp/R0054..R0058`, and the post-v1 trigger for the Sigma rule
adapter. Both items reference TTP_TAGGING.md sections so the
rationale stays linked to the design doc.
2026-05-02 01:35:49 -04:00
301d3feee9 feat(ttp): E.4.a extract decnet/cli/ttp.py with worker run + backfill CLI
The TTP worker entry moved out of decnet/cli/workers.py into its own
module so the TTP CLI surface (worker + admin verbs) is colocated,
mirroring decnet/cli/canary.py / webhook.py / swarm.py.

- New `decnet/cli/ttp.py` with `decnet ttp` (worker, ExecStart-stable
  for decnet-ttp.service) and `decnet ttp-backfill --since-days N`.
- `decnet ttp-backfill` walks Attacker.commands and CanaryTrigger
  history, dispatches each row through the live CompositeTagger,
  persists tags via repo.insert_tags (idempotent INSERT OR IGNORE).
  --dry-run / --source command|canary|all / --batch-size supported.
- Backfill deliberately bypasses bus publish — historical replay
  must not re-trigger SIEM/webhook fan-out per TTP_TAGGING.md
  §"Bus topics" loop-prevention invariant.
- Added `iter_attacker_commands_since` / `iter_canary_triggers_since`
  read-only iterators on TTPMixin + abstract bindings on
  BaseRepository.
- Master-only via gating; both `ttp` and `ttp-backfill` listed in
  MASTER_ONLY_COMMANDS.
2026-05-02 01:35:17 -04:00
e84b522fd3 feat(ttp): E.3.18c wire RuleEngine via RuleEngineTagger
The canonical rule-based engine from §"Tagging engines, layered §1"
of TTP_TAGGING.md was fully implemented but never instantiated as a
composite child — pure pattern rules (R0014/R0017/R0023/... 23 rules
total) had no tagger to dispatch them.

- Add `RuleEngineTagger(Tagger)` adapter in rule_engine.py wrapping
  `RuleEngine.evaluate()`. `HANDLES = {command, http_request,
  auth_attempt, payload}` — the source kinds whose rules typically
  live outside any per-source lifter.
- Adapter's `watch_store()` filters via `_is_engine_owned` so the
  engine's dispatch index excludes lifter-claimed rules
  (`match.kind: lifter:*`) and stays disjoint from per-lifter ownership.
- Prepend `RuleEngineTagger` to the `CompositeTagger` lifter list so
  generic pattern rules dispatch before per-source cross-event logic.
- Composes with E.3.18a (worker hydrates `watch_store`) and E.3.18b
  (worker fans session payloads into per-`command` events) — together
  these three commits make R0001–R0030 actually fire at runtime.
2026-05-02 01:29:58 -04:00
65435f1427 feat(ttp): E.3.18b worker fans session-ended payloads into per-command events
R0001–R0030 declare `applies_to: [command]` and match per command, not
per session. The worker now translates one `attacker.session.ended`
payload carrying a `commands: list` into:
  - one source_kind="session" event (behavioral / cross-event lifters)
  - one source_kind="command" event per command (RuleEngineTagger)

Both string and dict command shapes are accepted; dicts contribute
their `id` / `uuid` / `command_id` as the per-command source_id so
the deterministic `compute_tag_uuid` keeps replays idempotent. Tags
from session + per-command dispatch are aggregated into a single
`ttp.tagged` envelope per upstream session.
2026-05-02 01:27:37 -04:00
44ade3eb63 fix(ttp): E.3.18a worker hydrates per-lifter rule indexes via watch_store
Each per-source lifter holds its own RuleIndex and exposes an
`async watch_store()` that loads the corpus and drains store change
events forever. Until this commit nothing called `watch_store()` in
production — every dispatch index stayed empty and no rule fired.

- Add `WatchableTagger` runtime-checkable Protocol in `decnet.ttp.base`.
- `CompositeTagger.iter_watchables()` yields lifters that satisfy it.
- `run_ttp_worker_loop` fans out one task per watchable, cancelled
  and awaited alongside pump/heartbeat/control in the existing finally.
- Watch failures log and exit the watch task without taking the
  worker down — mirrors the pump-task tolerance contract.
2026-05-02 01:25:15 -04:00
9a31d0e50c feat(ttp): E.3.17 worker registration + scoped schemathesis suite
Wires decnet-ttp as a first-class worker:

* `decnet ttp` CLI command (master-only via MASTER_ONLY_COMMANDS)
* deploy/decnet-ttp.service.j2 systemd unit (After= identity / intel
  / reuse-correlator workers; ProtectHome=read-only since
  FilesystemRuleStore only reads ./rules/ttp/)
* deploy/decnet.target Wants= chain extended with decnet-ttp.service
* `ttp` was already in web/worker_registry.KNOWN_WORKERS

tests/api/test_schemathesis_ttp.py: TTP-routes-only schemathesis
suite, filtered via the OpenAPI tags=["TTP Tagging"] annotation
shared by the eight TTP routes. Reuses the live uvicorn subprocess
the wider test_schemathesis spawns; max_examples=400 keeps the
focused gate fast for E.3.13–E.3.16 iteration.

wiki-checkout/Service-Bus.md committed in its own repo: ttp.tagged
and ttp.rule.fired.<id> flipped from "reserved (TTP worker)" to
"decnet.ttp.worker" now that the worker publishes them.
2026-05-01 21:26:46 -04:00
07a609973b feat(ttp): E.3.16 frontend TTP UI
TTPsObservedSection.tsx: shared analyst-facing rollup. scope=
identity drives /ttp/by-identity/{uuid} (primary, with Navigator
export download); scope=attacker drives /ttp/by-attacker/{uuid}
(per-IP slice). Tactic → technique tree in fixed UKC-aligned order,
counts and confidence-weighted bars. Literal "NO TECHNIQUES
OBSERVED YET" empty state per TTP_TAGGING.md §"UI surface — Empty
state": no spinner, no fallback list.

RuleStateControls.tsx: admin-only rule operational state panel
backed by POST/DELETE /ttp/rules/{rule_id}/state. Server-gated by
require_admin AND client-gated on /config?.role so a non-admin
never sees the controls (per feedback_serverside_ui.md the client
gate is UX, not security — the server returns 403 either way).
Wired into Config.tsx as a new "TTP RULES" admin tab.

Wired TTPsObservedSection into IdentityDetail (above fingerprints)
and AttackerDetail (above TIMELINE). DeckyFleet/PersonaGeneration
vocabulary throughout (logs-section / section-header / btn /
matrix-text / dim-chip).

tsc --noEmit and vite build clean.

The dev-server browser smoke is deferred per the "can't reliably
exercise UI from this harness" reality — typecheck + build is the
correctness gate, not feature verification.
2026-05-01 21:05:28 -04:00
403d83faba feat(ttp): E.3.15 UKC bridge — production phase-handoff edge fires
Add BaseRepository.list_ttp_decky_phases(identity_uuid) returning
per-decky tag observations as (decky_id, tactic, created_at_ts) rows
ordered by creation time. Rewrite from_identity_row() to project
tactic → UKCPhase via tactic_to_ukc_phase and populate the four
phase-handoff maps (first/last_phase_per_decky,
first/last_seen_per_decky) so combined_campaign_weight finally lights
up on real DB rows — not just synthetic fixtures.

ConnectedComponentsCampaignClusterer.tick() pulls each active
identity's per-decky phase observations before projecting features.
Repo failures are non-fatal: a partial repo falls back to the empty
phase-handoff signal (legacy behavior) so the worker stays up.

tests/clustering/test_ttp_phase_handoff.py pins the production-row
pair clearing CAMPAIGN_EDGE_THRESHOLD on a C2 → DISCOVERY hand-off —
the trip-wire that says the whole project paid off.

commands_by_phase_on_decky itself stays empty on the production path:
it is consumed only by the synthetic-fixture similarity surface, and
the phase-handoff edge does not use it. Synthetic fixtures still
populate it directly via from_synthetic_identity.
2026-05-01 21:01:58 -04:00
101127247e feat(ttp): E.3.14 worker bootstrap (insert + ttp.tagged publish)
Inner loop drains a per-process asyncio.Queue populated by one pump
task per topic in _TOPICS, dispatches each event through
CompositeTagger, persists via repo.insert_tags(), and publishes
ttp.tagged + per-technique ttp.rule.fired.<id> only when the insert
returned a non-zero rowcount.

CompositeTagger seeded with all six lifters (Behavioral, Intel,
CanaryFingerprint, Email, Identity, Credential).

Loop-prevention invariant from TTP_TAGGING.md §"Bus topics" enforced:
N replays of the same upstream event publish exactly one ttp.tagged
event. test_worker_bus covers both the direct invocation path and
the idempotency replay path.

Intel catch-up via attacker.session.ended is intentionally deferred
to E.3.14b — needs a session→intel join the repo doesn't expose yet.
2026-05-01 20:57:57 -04:00
322fd44d72 feat(ttp): E.3.13 IdentityLifter + CredentialLifter (R0001-R0006)
IdentityLifter owns lifter:identity_* — currently R0003 (password
spraying). CredentialLifter owns lifter:credential_* — R0001 generic
auth brute, R0002 password guessing, R0004 credential reuse, R0005
valid-account use, R0006 default credentials.

YAMLs R0001/R0002/R0003/R0005/R0006 had their match.kind normalised
to fit the lifter prefix scheme — the design doc's promised "YAMLs
normalised in a separate refactor commit" lands here.

Identity-rollup tags null out attacker_uuid on emit so the worked-
example invariant holds (the tag belongs to the Identity, never to
one member IP).

Tests: test_identity_lifter.py + test_credential_lifter.py cover
each predicate's positive/negative path, state modulation
(disabled/clipped/expired), source-kind gating, and idempotent
replay. test_lifter_absence and test_lifters updated for the new
ctor signature.
2026-05-01 20:52:56 -04:00
62ad76615e docs(ttp): mark E.3.9-E.3.12 lifters done
Records the RuleIndex extraction prerequisite, the lifter:<owner>_
prefix routing convention, per-provider technique fan-out logic for
intel rules, the canary identity-merge guard rail, and the email PII
allowlist + R0042 simhash requirement.
2026-05-01 20:31:47 -04:00
7a89fbb357 feat(ttp): E.3.12 EmailLifter (R0041-R0048)
SMTP message-level technique tagger per Appendix A.6: open relay abuse
(rcpt_count + foreign From), mass phishing (rcpt_count + body simhash),
phishing-kit X-Mailer, IDN/punycode URL, sender masquerade composite
(From/Return-Path/DKIM/SPF), malicious attachment (macro/.lnk/.iso/.img/
hash match), BEC subject+body composite, encoded payload in body.

PII discipline (TTP_TAGGING.md §'Hard parts §6') is enforced at the
lifter layer via _filter_evidence(): emitted TTPTag.evidence is
restricted to the EmailEvidence-allowed allowlist (body_sha256,
matched_headers — names only, rcpt_domain_set — domains only,
attachment_sha256s, rcpt_count) plus PII-safe match discriminators
(matched_kit, matched_trigger, matched_url_host, etc). Raw addresses,
raw body bytes, full URLs, and decoded base64 previews NEVER appear in
evidence — defense-in-depth over the YAML evidence_fields hint.

Tests: tests/ttp/test_email_lifter.py per-rule positive + negative +
PII allowlist guard + state modulation. tests/ttp/rule_precision/
test_email_rules.py xfail flipped to real precision (R0041-R0048
H-band ≥95%). Corpus rows updated to acknowledge that R0045 (masquerade)
co-fires with R0041 / R0047 when the sender-masquerade signals are
present alongside open-relay or BEC patterns — overlap is by design,
not a precision bug.
2026-05-01 20:31:03 -04:00
f211d394e6 feat(ttp): E.3.11 CanaryFingerprintLifter (R0049-R0053)
Browser-payload derivations per Appendix A.9: navigator.webdriver flag,
canvas/audio/WebGL automation hash matches (Puppeteer/Playwright/
Selenium/curl-impersonate), WebRTC IP leak, TZ/language vs source-IP
geo mismatch, navigator.platform vs userAgent vs WebGL renderer
inconsistency.

Evidence shape pinned to CanaryFingerprintEvidence (metric +
matched_signature) — raw fingerprint blobs (canvas hashes, full UAs,
navigator.platform values) explicitly NOT carried into TTPTag.evidence
per TTP_TAGGING.md §'Hard parts §7' (enrichment vs tag boundary). The
identity-merge guard rail is preserved: composite fp.id matches across
IPs are NOT a TTP, so no rule fires on the bare hash.

Tests: tests/ttp/test_canary_fingerprint_lifter.py per-rule positive +
negative + evidence-shape guard + state modulation.
tests/ttp/rule_precision/test_canary_rules.py xfail flipped to real
precision (R0049/R0050/R0051/R0053 H-band ≥95%; R0052 M-band ≥80%).
2026-05-01 20:25:57 -04:00
7865e71aa9 feat(ttp): E.3.10 IntelLifter (R0054-R0058)
Per-provider verdict translator for AbuseIPDB, GreyNoise, Feodo Tracker,
and ThreatFox per Appendix A.10. Each rule's predicate inspects payload
fields produced by the enrich worker (no DB I/O, no decnet.intel.*
imports — E.2.7 decoupling guard preserved). AbuseIPDB confidence is
scaled by abuse_confidence_score / 100; categories drive per-technique
fan-out. R0058 aggregate-bump is a no-op in v0 (cross-tag bump deferred
to E.3.14 worker bootstrap).

Per-provider null tolerance is the steady state — a missing provider
column produces zero tags from that rule, never an error.

Tests:
- tests/ttp/test_intel_lifter.py — per-provider positive + negative +
  state modulation + decoupling source-import guard.
- tests/ttp/rule_precision/test_intel_rules.py — xfail flipped, real
  precision driven over seed_intel.jsonl (R0054-R0057 H-band ≥95%;
  R0058 skipped as bump-only).
- tests/ttp/test_lifter_absence.py — IntelLifter all-populated test
  flipped from xfail-strict to real assertion with realistic payload.
- tests/ttp/test_lifters.py — partial-null xfail flipped to real
  assertion.
2026-05-01 20:23:42 -04:00
eff3e4bce7 feat(ttp): E.3.9 BehavioralLifter (R0031-R0040)
Reads pre-shaped session aggregates from TaggerEvent.payload and emits
techniques per Appendix A behavior tables. Per-rule predicates dispatch
on match.kind (lifter:behavioral_<name>); the lifter holds its own
RuleIndex watching the same RuleStore as the engine, so disable / clip /
TTL state reaches lifter-bound rules through the same atomic-swap path.

R0032/R0036/R0037/R0040 YAMLs had over-escaped regex strings (\\
instead of \\) — fixed in place.

Factory wired so default get_tagger() returns CompositeTagger with
BehavioralLifter shipped; remaining three lifters (E.3.10-E.3.12) land
in subsequent commits.

E.2.6 contract preserved via TolerantTagger: empty payload steady-state
yields [] with zero ERROR records. Disabled / clipped / expired state
verified.
2026-05-01 20:17:59 -04:00
321ea7a2a6 refactor(ttp): normalise lifter:<owner>_<name> match.kind prefix
E.3.9.1 prerequisite. Rules R0031-R0040 now use lifter:behavioral_*,
R0041 (open_relay) uses lifter:email_open_relay; the rest of the email,
canary, and intel cohorts already conformed. Each lifter at E.3.9-E.3.12
will claim its rules via str.startswith('lifter:<owner>_'), keeping the
ownership routing explicit and trivially extensible.

R0001-R0006 / R0030 lifter:* rules are E.3.13 (Identity/Credential)
territory and stay as-is.
2026-05-01 20:10:33 -04:00
e7531ee756 refactor(ttp): extract RuleIndex from RuleEngine
E.3.9.0 prerequisite for the per-source lifters (E.3.9-E.3.13). The
dispatch index, install/evict/apply_change atomic-swap protocol, and
state-modulation helpers (is_active / apply_ceiling) move out of
rule_engine.py into _rule_index.py and _state.py. RuleEngine wraps a
RuleIndex; back-compat shims preserve _by_kind / _by_rule / _install
attribute access for tests poking at the dispatch internals.

Lifters in E.3.9-E.3.12 will each hold their own RuleIndex, watching
the same RuleStore via subscribe_changes() fan-out. Hot-reload
semantics (disable / clip / TTL via set_state API) now reach
lifter-bound rules through the same atomic-swap path the engine uses,
not a future composite-rebuild compromise.
2026-05-01 20:09:18 -04:00
b819dfefa3 feat(ttp): E.3.8 R0054-R0058 intel cohort + mark step done
5 YAMLs for the intel-verdict cohort per Appendix B / A.10:
AbuseIPDB category mapping, GreyNoise classification, Feodo
Tracker hit, ThreatFox IOC type, aggregate-malicious bump-only.
IntelLifter (E.3.10) consumes by rule_id and tolerates absence
silently (null provider column → no tag).

R0058 is the meta bump-only rule — emits a single confidence=0.0
sentinel so it validates and surfaces in the catalogue, but the
repository's sub-0.3 drop ensures no fresh tag persists if the
fanout fires accidentally. test_intel_rules.py pins that
zero-confidence invariant.

Marks E.3.8 done in development/TTP_TAGGING.md with the cohort-
split summary.
2026-05-01 09:22:48 -04:00
dc1867315d feat(ttp): E.3.8 R0049-R0053 canary fingerprint cohort
5 YAMLs for the canary-fingerprint cohort per Appendix B / A.9:
navigator.webdriver flag, automation canvas/audio/WebGL hash match,
WebRTC IP leak, TZ/lang vs geo mismatch, platform inconsistency.
CanaryFingerprintLifter (E.3.11) consumes by rule_id.

test_canary_rules.py: YAML-present + inert-in-v0 + xfail(strict)
gated on E.3.11.
2026-05-01 09:21:01 -04:00
1ad15470a1 feat(ttp): E.3.8 R0041-R0048 email cohort
8 YAMLs for the email cohort per Appendix B: open-relay abuse,
mass phishing, phishing-kit X-Mailer signatures, IDN/punycode
URLs, sender masquerade, malicious attachment, BEC, encoded
payload in body. EmailLifter (E.3.12) consumes by rule_id.

test_email_rules.py: YAML-present + inert-in-v0 + xfail(strict)
precision case gated on E.3.12.
2026-05-01 09:19:56 -04:00
806301e179 feat(ttp): E.3.8 R0031-R0040 behavioral cohort
10 YAMLs for the behavioral / cross-event cohort per Appendix B:
beaconing, data destruction, ransom note, web exfil, DB mass-read,
credentials-in-files, k8s SA token harvest, Docker host escape,
LLMNR poisoning, TFTP router-config retrieval.

Every rule is lifter-bound (BehavioralLifter / IdentityLifter) —
the v0 RuleEngine cannot count, aggregate, or compose cross-event
signals, so these YAMLs declare the technique mappings the lifter
will consume by rule_id at E.3.9. Their match specs use a
'kind: lifter:*' shape inert to the regex matcher.

test_behavioral_rules.py asserts each YAML compiles, none fire
from the v0 engine (FP regression guard against a YAML drifting
into a regex), and an xfail(strict=True, reason='impl phase E.3.9')
precision case that will flip green when the lifter lands.
2026-05-01 09:18:27 -04:00
b1fe1f9403 feat(ttp): E.3.8 R0001-R0030 command cohort
30 YAMLs for the shell/command rule cohort per Appendix B (rules/ttp/).
Splits into engine-active (R0007-R0029, regex on command_text /
raw_url / user_agent) and lifter-bound (R0001-R0006, R0030 — the
v0 RuleEngine cannot count auth attempts, do identity rollups, or
parse fingerprint blobs; the BehavioralLifter / IdentityLifter /
CredentialLifter consume them by rule_id at E.3.9 / E.3.13).

test_command_rules.py asserts:
- every R000N has a YAML that compiles
- lifter-bound rules NEVER fire from the v0 engine (regression
  guard against a YAML drifting into a regex match.spec)
- engine-active rules meet their Appendix-C precision target
  against the seed corpus (≥0.95 high-conf, ≥0.80 medium)

Conftest fixes: precision_engine moved to module-scope so module-
scope precomputed dispatch fixture (fired_by_label) can request it;
_RULES_DIR path bumped from parents[2] to parents[3] so the loader
resolves the project root regardless of pytest cwd; make_event
synthesizes attacker_uuid so TTPTag's anchor invariant is satisfied.

Seed corpus broadened: positive examples for every regex rule plus
6 negative examples across innocuous shell verbs (ls, echo, cd, ps,
df, free) so FPs surface in precision rather than passing vacuously.
2026-05-01 09:16:38 -04:00
c635478442 feat(ttp): E.3.8 corpus + harness — labelled holdout fixture
Sub-step preceding the rule-pack commits per TTP_TAGGING.md:2967.
Adds the per-rule precision suite scaffolding under
tests/ttp/rule_precision/:

- conftest.py: precision_engine fixture (RuleEngine populated from
  ./rules/ttp/), corpus_loader (real → seed → empty fallback),
  precision_for() helper for TP/FP accounting.
- _build_corpus.py: extractor for a real prod corpus pull. Mandatory
  --exclude-ip / DECNET_TTP_CORPUS_EXCLUDE_IPS — operator IPs never
  end up in the committed exclusion list. Pulls both 'command' and
  'unknown_command' event types.
- corpus/seed_*.jsonl: synthetic seed rows for each cohort so the
  harness exercises in clean checkouts.
- corpus/*.jsonl (operator-built) is gitignored.
- test_corpus_loads.py: sentinel that every seed file parses.
2026-05-01 09:08:07 -04:00
ed3f340ea8 feat(ttp): E.3.7 RuleEngine — evaluate + atomic-swap watch_store
Implements the rule engine body left empty at contract phase: evaluate()
dispatches by source_kind through self._by_kind, runs the rule's match
spec against event.payload, and emits one TTPTag per emits entry.
watch_store() loads the initial corpus from RuleStore.load_compiled,
then drains subscribe_changes, applying definition changes via
single-statement dict assignment (atomic swap, GIL-atomic to readers)
and state changes via NamedTuple._replace on the existing CompiledRule.

Why: with the FS + DB stores in place (E.3.5/E.3.6), the engine is the
last piece of the rule plane. Lifters (E.3.9–E.3.13) consume the
engine; the worker bootstrap (E.3.14) wires watch_store into the
asyncio event loop. After this commit a CompositeTagger constructed
with a RuleEngine + a populated rules dir will produce real tags.

Notes:
- CompiledRule.emits extended to 4-tuple
  (technique_id, sub_technique_id, tactic, confidence). Tactic + confidence
  ride per-emit so a single rule can carry multiple precision targets
  (the "one event maps to many techniques" property). Compile helpers in
  both backends extract them from the YAML emits dict; missing tactic
  or confidence is a deploy-time error.
- v0 match operator is "pattern" (regex). The field defaults per
  source_kind (command_text / raw_url / subject / verdict / …) and is
  overridable via match.field. Future ops (contains, equals, in_set)
  extend _match_event without touching the engine surface.
- Confidence model: rules with state="clipped" + confidence_max set
  cap the per-emit confidence downward; clipped is a soft suppress, not
  a hard skip. Disabled rules are skipped wholly; expires_at past is
  re-checked at evaluate as defense-in-depth (the store auto-reverts,
  but a racing read between expiry and revert must not fire the rule).
- _span(name, **attrs) helper in engine + both stores short-circuits on
  decnet.telemetry._ENABLED — matches the project's @traced /
  wrap_repository zero-overhead-when-disabled pattern instead of relying
  solely on the no-op tracer indirection.
- Late-bound tracer (telemetry.get_tracer called per-span, not at
  module load) so test_tracing's monkeypatch reaches the production
  code path.

xfails flipped: tests/ttp/test_rule_engine.py multi-emit fan-out +
rule_version-collision-via-engine; tests/ttp/test_multi_mapping.py
N×M engine fan-out + idempotent replay; tests/ttp/test_tracing.py
ttp.eval span hierarchy + ttp.rule.fire span attributes.

Tests: 214 passed, 19 xfailed (gated on E.3.8 lifters / rule pack /
worker bootstrap).
mypy: clean on prod code; pre-existing test-stub arg-type warnings
unchanged.
2026-05-01 08:49:15 -04:00
8a93ee3129 feat(ttp): E.3.6 DatabaseRuleStore — ttp_rule/ttp_rule_state + master sync
Implements the DB-backed rule store body left empty at contract phase:
load_compiled reads from ttp_rule + ttp_rule_state; get_state /
set_state hit ttp_rule_state with the same expires_at auto-revert and
bus-event semantics as the FS backend; subscribe_changes returns a
per-subscriber queue. State persists across process restarts — the
swarm property the FS backend deliberately doesn't have.

Also lands two swarm-mode helpers:
- sync_from_filesystem(fs_store) — master-side, subscribes to a
  FilesystemRuleStore and projects each RuleChange onto a ttp_rule
  upsert/delete.
- tail_db(poll_interval) — worker-side, watermark poll over
  ttp_rule.updated_at; emits RuleChange("definition", ...) for each
  row that moved.

Why: swarm mode needs rule definitions and operator state to
propagate across hosts. The filesystem backend (E.3.5) was the
single-host-dev variant; this one survives restart and serves N
workers from a shared DB.

Notes:
- DatabaseRuleStore() with no args lazy-inits an in-memory SQLite
  repo so the conformance fixture works without test plumbing. In
  production the worker bootstrap (E.3.14) passes an explicit repo.
- The conftest.py rule_store fixture became async (pytest_asyncio),
  per-backend creates/initializes a SQLite repo for the DB run.
- Adds a `seed_rule(store, rule_id, yaml)` helper to bridge backend
  semantics: drop a YAML file (FS) vs insert a ttp_rule row (DB).
  Used by the parametrized load_compiled conformance test.
- Late-bound _tracer() in both backends (was module-level get_tracer
  binding) so test_tracing's monkeypatch of decnet.telemetry.get_tracer
  actually affects span output.

xfails flipped: tests/ttp/store/test_database.py set_state-writes-to-
ttp_rule_state + filesystem-to-DB sync; tests/ttp/store/test_conformance.py
DB-side load_compiled / set_state isolation / round-trip / per-rule
fan-out / expired-state revert / set_state failure / get_state default
(was xfail-only-on-DB);  tests/ttp/test_tracing.py set_state span
hierarchy.

Tests: 208 passed, 25 xfailed (gated on E.3.7 + lifters).
mypy: clean on all touched files.
2026-05-01 08:39:46 -04:00
f41995a229 feat(ttp): E.3.5 FilesystemRuleStore — inotify hot-reload + per-rule events
Implements the filesystem-backed rule store body left empty at contract
phase: YAML parse + Pydantic validation, asyncinotify watch over
./rules/ttp/, in-process state cache with auto-revert on expires_at,
and a subscribe_changes() async iterator yielding one RuleChange per
per-rule edit. Bus topic builders ttp_rule_reloaded / ttp_rule_state
ship alongside.

Why: the rule plane needed a store before the engine (E.3.7) could
consume RuleChange events and atomically swap compiled rules into its
dispatch index.

Notes:
- Linux-only by construction (asyncinotify wheel gated by sys_platform
  marker; FilesystemRuleStore.__init__ raises on non-Linux).
- Filename allowlist is the FIRST check on every inotify event.
- Content-hash dedup so a single write firing IN_CREATE + IN_CLOSE_WRITE
  produces exactly one RuleChange.
- All compile work serializes on a single asyncio.Lock.
- Subscribers register their queue eagerly so events fired between
  subscribe_changes() and the first __anext__() are buffered.

xfails flipped: per-save-style + filter-ordering + atomic-swap in
test_filesystem.py; load_compiled / set_state isolation / round-trip /
per-rule fan-out / expired-state revert / set_state failure semantics
in test_conformance.py (FS side; DB side stays xfail until E.3.6);
malformed-YAML compile-time check in test_rule_engine.py.

Tests: 197 passed, 35 xfailed (gated on E.3.6 / E.3.7 / lifters).
mypy + bandit: clean on all touched files.

Wiki update for the per-rule reload + state-change topics lands in a
matching wiki-checkout/Service-Bus.md edit (separate repo).
2026-05-01 08:31:05 -04:00
89ce893792 feat(ttp): E.3.4 API handlers wired to repo (rollups + Navigator)
Five GET rollup endpoints (techniques, by-identity, by-attacker,
by-campaign, by-session) and the Navigator export (fleet +
per-identity) now call into the TTPMixin methods. Rule catalogue
endpoint still returns [] — backed by the RuleStore which lands
at E.3.5/E.3.6.
2026-05-01 08:06:53 -04:00
fee697694d feat(ttp): E.3.3 repository — insert_tags + listing rollups (dual backend)
Dialect-split: portable rollup queries on TTPMixin; bulk insert with
ON CONFLICT DO NOTHING / INSERT IGNORE in the per-dialect repos.
Confidence-floor (< 0.3) drop applied at mixin layer before the
dialect hook. BaseRepository now declares the six TTP methods abstract.

Tests in tests/web/db/test_ttp_repo.py flipped from pytest.fail stubs
to real dual-backend behavioral tests; tests/ttp/test_confidence.py
drop-below-floor xfail removed.
2026-05-01 08:04:46 -04:00
226b3adfa2 docs(ttp): mark E.3.1 + E.3.2 done — schema/bus verification 2026-05-01 07:57:38 -04:00
3664ea7008 docs(ttp): mark E.2.9–E.2.14b as done in design doc
Each section gets a Status:  done block summarising what's GREEN
today vs xfail-gated and noting any divergence from the doc's
original wording (E.2.9 lossy observable phases; E.2.13 db_backends
fixture landed alongside; E.2.14a Jaeger-skip + tracing-enabled
plumbing; E.2.14b NamedTuple AttributeError vs FrozenInstanceError).
2026-05-01 07:47:01 -04:00
0217319423 test(ttp): E.2.14b RuleStore conformance — cross-backend + filesystem-specific + database-specific
tests/ttp/store/conftest.py — parametrized rule_store fixture over
FilesystemRuleStore (skipped on non-Linux) + DatabaseRuleStore.

test_conformance.py — shared assertions (default-state, set_state
isolation/round-trip, subscribe_changes per-rule fan-out, expires_at
auto-revert, set_state failure semantics) parametrize over both.
get_state-default GREEN today on FS (returns RuleState() for empty
cache); rest xfail-gated behind E.3.5/E.3.6.

test_filesystem.py — inotify mask + canonical kernel values + 9
scratch-filename rejections + 4 valid-filename acceptances +
fullmatch anchor + tmp_path construction + CompiledRule frozen
property GREEN today; per-save-style + filter-ordering +
atomic-swap concurrency xfail-gated.

test_database.py — class-level surface (no platform guard, ABC
methods concrete, async coroutines) GREEN today; ttp_rule_state
write + filesystem→DB sync xfail-gated behind E.3.6.
2026-05-01 07:45:32 -04:00
bf5414c0d1 test(ttp): E.2.14a follow-up — force DECNET_DEVELOPER_TRACING=true, skip when Jaeger unreachable
Session-scoped autouse fixture in tests/ttp/conftest.py sets
DECNET_DEVELOPER_TRACING=true and forces decnet.telemetry._ENABLED
so the no-op tracer doesn't silently swallow emitted spans. The
span_exporter fixture also monkeypatches decnet.telemetry.get_tracer
so production code under test lands spans in the in-memory
exporter. Tracing tests skip when DECNET_OTEL_ENDPOINT (default
localhost:4317) isn't reachable so the dev loop stays green
without lying about coverage.
2026-05-01 07:42:22 -04:00
f4fe6fe6e4 test(ttp): E.2.14a observability tracing — span hierarchy + no-PII property
In-memory span exporter fixture wired to a per-test TracerProvider
(OTEL global is locked once set, so each test gets its own).
ttp.eval / ttp.lifter.{name} / ttp.rule.fire / ttp.rule.state.change
hierarchy + no-PII canary battery xfail-gated behind E.3.5–E.3.13.
2026-05-01 07:40:58 -04:00
4a93e16407 test(ttp): E.2.13 repository tests — TTPMixin idempotency + identity-rollup projection on dual backends
Adds tests/web/db/conftest.py with a db_backends fixture
parametrizing SQLite (always) + MySQL (gated on
DECNET_TEST_MYSQL_URL). Surface assertions (mixin methods present
+ async) GREEN today; insert_tags idempotency, identity rollup
projection, attacker-rollup exclusion of NULL-attacker tags
xfail-gated behind E.3.3.
2026-05-01 07:39:16 -04:00
6814949bc0 test(ttp): E.2.12 worker bus integration — _TOPICS equality, loop-prevention, delivery asymmetry
Pin _TOPICS frozenset against documented set (single source of
truth). Worker→engine invocation, loop-prevention invariant,
attacker.enriched/email.received catch-up asymmetry xfail-gated
behind E.3.14.
2026-05-01 07:37:58 -04:00
c276b5696e test(ttp): E.2.11 multi-mapping property — N×M fan-out, idempotent UUID, replay-safety
Hypothesis property: N rule_ids × M technique_ids on one event yield
N×M distinct tag UUIDs. Worked example pinned: one rule emitting
(T1110, None) and (T1078, None) → two distinct UUIDs. Engine-level
fan-out + replay xfail-gated behind E.3.7.
2026-05-01 07:36:19 -04:00
fd81be0bb1 test(ttp): E.2.10 confidence model — downward-only multiplier property, drop-below-0.3, AbuseIPDB-30 worked example
Pure-arithmetic adjustment formula pinned via Hypothesis property
test (multiplier ∈ [0,1] cannot raise base). Drop-at-floor and
provider-score multiplier xfail-gated behind E.3.3 / E.3.10.
2026-05-01 07:34:58 -04:00
79e6df8343 test(ttp): E.2.9 UKC bridge bijection — pin tactic↔phase mapping, observable round-trip, lossy phases
Pre-target phases (RECONNAISSANCE/RESOURCE_DEVELOPMENT/WEAPONIZATION/
SOCIAL_ENGINEERING) and observable-but-unmappable phases (EXPLOITATION/
PIVOTING/OBJECTIVES, UKC-only concepts ATT&CK lacks tactics for) are
pinned as lossy via _LOSSY_INVERSE_REFERENCE so a future contributor
cannot 'fix' the asymmetry without tripping the suite.
2026-05-01 07:33:47 -04:00
bcd1f14cd3 feat(ttp): E.1.11 RuleStore contract — base ABC, factory, filesystem + database stubs
Adds decnet/ttp/store/ subpackage:
- base.py: RuleState frozen dataclass, RuleChange NamedTuple, RuleStore ABC
- factory.py: get_rule_store() reading DECNET_TTP_RULE_STORE_TYPE
- impl/filesystem.py: FilesystemRuleStore with sys.platform=='linux'
  fail-fast guard, allowlist filename regex, raw inotify mask bits
  (lib import deferred to E.3 so contract phase compiles without the
  asyncinotify dep installed)
- impl/database.py: DatabaseRuleStore stub (no platform guard)

TTPRule + TTPRuleState SQLModels were already shipped at E.1.1; this
commit closes the type-only TYPE_CHECKING forward-ref in
rule_engine.py via real runtime imports through the new package.
2026-05-01 07:25:09 -04:00
b6e31e64e9 feat(ttp): E.1.10 repository contract — TTPMixin with insert_tags + list_techniques_by_{identity,attacker,campaign,session} + list_distinct_techniques
Empty NotImplementedError bodies; the SQL lands at E.3 implementation.
Mixin composed onto SQLModelRepository alongside the existing domain
mixins. Dialect-specific INSERT-OR-IGNORE syntax overrides land in
the per-backend subclasses at E.3 per the dual-DB-backend convention.
2026-05-01 07:21:37 -04:00
b7f206c8c5 feat(ttp): E.1.9 API contract — seven router endpoints, admin-gated state mutations, response models
Mounts /api/v1/ttp/* with empty-list / empty-Navigator responses.
GET endpoints viewer-gated; POST/DELETE /rules/{rule_id}/state
admin-gated server-side. POST parses JSON manually so a malformed
body returns the documented 400 (per feedback_schemathesis_400).

Drops xfail-strict markers from E.2.8 tests now that the router is
mounted; 26 tests pass against the contract handlers.
2026-05-01 07:20:13 -04:00
cfbfaabfcd feat(ttp): E.1.8 UKC bridge contract — ATTACK_TACTIC_TO_UKC + tactic_to_ukc_phase + inverse 2026-05-01 07:12:00 -04:00
b5a19301a2 test(ttp): E.2.8 API shape + auth — GET 200/401 + admin-only POST/DELETE 401/403/200/400 contract 2026-05-01 07:00:41 -04:00
0cdf8d90da test(ttp): E.2.7 decoupling lint — TTP code may not import decnet.intel.* providers or decnet.profiler.keystroke 2026-05-01 06:58:12 -04:00
e2078c868d test(ttp): E.2.6 lifter tolerates absence — six lifters return [] on empty joins, no ERROR logs 2026-05-01 06:57:29 -04:00
1ffaa3df41 test(ttp): E.2.5 RuleEngine behavior — empty store, malformed YAML, multi-emit fan-out, version collisions 2026-05-01 06:56:28 -04:00
5accf8f1b1 test(ttp): E.2.4 Tagger ABC conformance — hypothesis fuzz over swallowed Exception types 2026-05-01 06:54:29 -04:00
cce84f23dc test(bus): E.2.3 TTP topic naming — constants, builders, wildcard match 2026-05-01 06:53:05 -04:00
e58aa4fe3a test(ttp): E.2.2 idempotency — determinism, golden value, replay-safety signature lock 2026-05-01 06:45:49 -04:00
e6f1da2344 test(ttp): E.2.1b evidence shape — TypedDict keys, PII §6 type-level assertion 2026-05-01 06:45:35 -04:00
c3a799726f test(ttp): E.2.1 schema invariant tests — CHECK, ValueError guard, UUIDv5, JSON round-trip 2026-05-01 06:44:57 -04:00
19cc8aa859 feat(ttp): E.1.7 worker contract — run_ttp_worker_loop, _TOPICS, registry entry 2026-05-01 06:33:34 -04:00
208ffd8f4f feat(ttp): E.1.6 per-lifter contracts — six TolerantTagger subclasses 2026-05-01 06:31:31 -04:00
cb9d183c20 feat(ttp): E.1.5 RuleEngine contract — CompiledRule, RuleSchema, RuleEngine ABC 2026-05-01 06:30:12 -04:00
a703f9eda7 docs(ttp): mark E.1.3 and E.1.4 as done in design doc 2026-05-01 06:22:08 -04:00
c3c5813211 feat(ttp): E.1.3+E.1.4 Tagger ABC and composite factory contract
Third and fourth TTP-tagging contract commits, plus a scoped subset
of the E.2.4 conformance tests covering the contract surface shipped
here (full hypothesis-fuzz suite still lands with E.2.4).

E.1.3 — decnet/ttp/base.py
- TaggerEvent NamedTuple: source_kind, source_id, attacker_uuid,
  identity_uuid, session_id, decky_id, opaque payload.
- Tagger(ABC) with abstract async tag(); class-level name and
  HANDLES: frozenset[str] (default empty so a misconfigured subclass
  is loudly idle, not loudly noisy).
- TolerantTagger(Tagger): concrete tag() wraps abstract _tag_impl()
  in try/except Exception (deliberately not BaseException — so
  KeyboardInterrupt / SystemExit / asyncio.CancelledError propagate
  and the worker can shut down cleanly). Swallowed exceptions log
  at WARNING with exc_info, never ERROR — absence is the steady
  state, not a bug. Subclasses override _tag_impl, never tag — the
  tolerance contract is enforced in the base class, not on trust.
- KNOWN_SOURCE_KINDS: Final[frozenset[str]] enumerating every
  source_kind a producer is allowed to emit. Closed-by-enumeration
  at the runtime layer; the composite tagger keys its WARNING/INFO
  bridge off this constant to surface the silent-drop trap from
  the design doc (lines 160–195).

E.1.4 — decnet/ttp/factory.py
- get_tagger() reads DECNET_TTP_TAGGER_TYPE (default 'composite');
  unknown values raise ValueError with the known-list. Mirrors
  decnet.intel.factory and decnet.clustering.factory.
- _KNOWN = ('composite',). Per-lifter classes (E.1.6) are children
  of the composite, not standalone tagger types.
- CompositeTagger(Tagger): pre-computes a dict[str, list[Tagger]]
  dispatch index from each lifter's HANDLES; fans events out
  concurrently with asyncio.gather and concatenates results.
  Empty lifters=[] is the legal contract-phase state — E.1.6
  wires the real lifters in.
- Unhandled-event observability: source_kind in KNOWN_SOURCE_KINDS
  but no lifter claims it -> WARNING once per kind per process
  (missed E.1.6 update). Unknown kind -> INFO once per kind per
  process (future-feature telemetry, by design). Per-process dedup
  via plain set; E.1.6 may swap in a proper rate-limiter once
  production traffic shapes are known.

Tests — tests/ttp/test_base.py, tests/ttp/test_factory.py
- Tagger / TolerantTagger abstractness, missing-tag-impl rejection,
  WARNING-not-ERROR log level, propagation of KeyboardInterrupt /
  SystemExit / asyncio.CancelledError.
- Factory env-var routing, unknown-name ValueError, dispatch-index
  correctness, only-claiming-lifter invocation, WARNING-once for
  known-but-unclaimed kinds, INFO-once for unknown kinds, result
  concatenation across lifters.

Mypy clean under .311/bin/mypy --ignore-missing-imports.
2026-05-01 06:20:10 -04:00
e395306dcb feat(ttp): E.1.2 bus topic contract — TTP_TAGGED, TTP_RULE_FIRED, TTP_RULE_SUPPRESSED, EMAIL_RECEIVED
Second TTP-tagging contract commit. Constants only — no publishers,
no subscribers, no tests. (E.2.3 ships the bus-topic naming tests.)

- New roots: EMAIL, TTP.
- New leaves: EMAIL_RECEIVED ('received', single-token under EMAIL),
  TTP_TAGGED ('tagged'), TTP_RULE_FIRED ('rule.fired'),
  TTP_RULE_SUPPRESSED ('rule.suppressed'). Per-rule reload + state
  topics ship with the RuleStore (E.1.11) — co-located with
  producer.
- New builders: email_topic(event_type), ttp(event_type),
  ttp_rule_fired(technique_id). The ttp_rule_fired builder validates
  technique_id as a single segment so sub-techniques like T1110.001
  are rejected at construction; topic key is the parent technique,
  sub_technique lives in the payload.
- email_topic is named with the _topic suffix to avoid shadowing the
  Python email stdlib at import sites that pull both.
- TTP_TAGGING.md E.1.2 entry corrected: the spec referenced
  'ATTACKER_ENRICHED' but the actual constant is
  ATTACKER_INTEL_ENRICHED ('intel.enriched'). The existing constant
  covers the design intent (TTP intel_lifter wakes on
  attacker.intel.enriched). No rename — would break every existing
  subscriber.

Wiki update for the four new topics ships in a sibling commit in
wiki-checkout (separate repo per project layout).
2026-05-01 06:08:11 -04:00
ce7efdfdd2 feat(ttp): E.1.1 schema contract — TTPTag, TTPRule, TTPRuleState, evidence TypedDicts, compute_tag_uuid
First contract commit of TTP tagging. Shapes only — no behavior.

- TTPTag SQLModel: deterministic UUIDv5 PK; (source_kind, source_id)
  discriminated provenance; nullable attacker_uuid + identity_uuid
  with ON DELETE CASCADE; native sqlalchemy.JSON evidence column;
  required attack_release; CheckConstraint('attacker_uuid IS NOT
  NULL OR identity_uuid IS NOT NULL'); composite indexes for the
  primary query patterns (identity_uuid+technique_id,
  attacker_uuid+technique_id, technique_id+created_at); __init__
  guard raising ValueError with both anchor names in the message
  (belt-and-braces for MySQL <8.0.16 where CHECK is silent).
- compute_tag_uuid(): RFC-4122 UUIDv5 over the six tag-identity
  fields under a fixed _TTP_TAG_NS. Pure, deterministic, replay-safe.
- Per-source_kind evidence TypedDicts (CommandEvidence,
  IntelEvidence, EmailEvidence, CanaryFingerprintEvidence) — PII
  rule lives in the type: EmailEvidence has no field for raw rcpt
  addresses or body bytes.
- TTPRule + TTPRuleState tables for the DatabaseRuleStore (E.1.11).
- All symbols re-exported from decnet.web.db.models per the
  package's existing convention.

Tests for invariants (CHECK behavior, evidence round-trip across
SQLite+MySQL, idempotency property, init-guard ordering) land in
E.2.1/E.2.2 with xfail-strict markers per Appendix E discipline.
2026-05-01 06:03:45 -04:00
d09764beec docs(ttp): add TTP tagging design (order-of-work step 1)
Pre-implementation spec for the TTP-tagging worker. Defines the
ATT&CK-canonical vocabulary, schema (ttp_tag + ttp_rule[_state]),
bus topics, worker shape, lifter layering (rule-based v0,
behavioral/intel/email v0.5, sigma/biometric later), confidence
model, API surface, UI surface, observability, performance targets,
and a CDD plan (Appendix E) that splits contracts from tests with
xfail discipline so CI stays green between steps.
2026-05-01 06:02:56 -04:00
9e003d3acd Merge branch 'merge-rehearsal' into dev
# Conflicts:
#	decnet/templates/postgres/server.py
#	decnet/templates/rdp/Dockerfile
#	decnet/templates/redis/Dockerfile
#	decnet/templates/smtp/Dockerfile
#	decnet/templates/smtp/entrypoint.sh
#	decnet/templates/snmp/Dockerfile
#	decnet/templates/snmp/entrypoint.sh
#	decnet/templates/tftp/Dockerfile
#	decnet/templates/tftp/entrypoint.sh
#	decnet/templates/vnc/Dockerfile
#	decnet/templates/vnc/entrypoint.sh
#	templates/rdp/Dockerfile
#	templates/smb/Dockerfile
#	templates/smtp/Dockerfile
#	templates/smtp/entrypoint.sh
#	templates/snmp/Dockerfile
#	templates/snmp/entrypoint.sh
#	templates/tftp/Dockerfile
#	templates/tftp/entrypoint.sh
#	templates/vnc/Dockerfile
#	tests/services/test_smtp_relay.py
2026-05-01 02:27:20 -04:00
776861a1b7 fix(types): T7 — eliminate all remaining 38 mypy errors; fix DeckyRow subscript in engine tests 2026-05-01 02:19:53 -04:00
bd50b0d8b2 fix(types): T6 — suppress scapy attr-defined on lazy imports in tcpfp.py 2026-05-01 02:19:00 -04:00
f6e67c036d fix(types): T5 — narrow AsyncClient|None with inline if; rename loop variable t→task to avoid no-redef 2026-05-01 02:18:57 -04:00
d187304e99 fix(types): T4 — stop spreading TopologySummary as dict; fix heartbeat .get() and scope param 2026-05-01 02:18:55 -04:00
0f90dcfd3e fix(types): T3 — narrow str|None at 12 sites; fix LANRow/DeckyRow subscript in mutator tests 2026-05-01 02:18:53 -04:00
65a2bdf0e7 fix(types): T2 — add missing method stubs to BaseRepository; fix get_logs/add_lan/edge/decky signatures 2026-05-01 02:18:45 -04:00
ed6263a53d fix(types): T1 — remove 15 stale type: ignore comments confirmed unused by mypy 2026-05-01 02:18:40 -04:00
ee24a7551f fix(types): T7 — eliminate all remaining 38 mypy errors; fix DeckyRow subscript in engine tests 2026-05-01 02:07:53 -04:00
7e4da95091 fix(types): T6 — suppress scapy attr-defined on lazy imports in tcpfp.py 2026-05-01 01:53:59 -04:00
b9684254f0 fix(types): T5 — narrow AsyncClient|None with inline if; rename loop variable t→task to avoid no-redef 2026-05-01 01:53:10 -04:00
e387acf79d fix(types): T4 — stop spreading TopologySummary as dict; fix heartbeat .get() and scope param 2026-05-01 01:51:43 -04:00
d637ff515e fix(types): T3 — narrow str|None at 12 sites; fix LANRow/DeckyRow subscript in mutator tests 2026-05-01 01:47:04 -04:00
502ac42518 fix(types): T2 — add missing method stubs to BaseRepository; fix get_logs/add_lan/edge/decky signatures 2026-05-01 01:28:50 -04:00
f597ab2810 fix(types): T1 — remove 15 stale type: ignore comments confirmed unused by mypy 2026-05-01 01:26:24 -04:00
19271f9319 fix(types): P3 — annotate transport in all template protocol servers; 0 errors in templates/
- asyncio.Protocol (TCP): _transport: asyncio.Transport | None = None + cast() in
  connection_made; assert guards in every method that directly accesses the field.
  Files: pop3, smtp, mqtt, postgres, mssql, mongodb, imap, ldap, redis, mysql, sip, vnc.
- asyncio.DatagramProtocol (UDP): _transport: asyncio.DatagramTransport | None = None.
  Files: snmp, tftp, SIPUDPProtocol.
- RDP: assert new_transport is not None after start_tls() to narrow Transport | None.
- FTP (Twisted): assert self.transport is not None + targeted type: ignore for imprecise
  Twisted stubs (misc/override/arg-type/attr-defined), IReactorTCP cast for listenTCP.
- conpot: proc.stdout is None guard before iteration.
- Bonus fixes surfaced by annotation:
  - smtp: get_payload(decode=True) bytes narrowing (arg-type on sha256)
  - postgres: rename shadowed `msg` param to `err_msg` in _handle_startup
  - mongodb: base64.binascii.Error → import binascii; binascii.Error
  - imap: result: list[int] = [] (var-annotated)
2026-05-01 01:09:14 -04:00
52b5074149 chore(types): P2 — mark sqlmodel_repo complete in STATIC-TYPES.md 2026-05-01 00:50:00 -04:00
614780f144 fix(types): P2 — wire _MixinBase + col() across sqlmodel_repo; suppress pydantic/SQLModel column typing false positives
- Add _MixinBase abstract class to _helpers.py: declares _session(),
  _deserialize_attacker(), _assert_pending(), _check_and_bump_version(),
  and list_running_topology_deckies() so mypy can see cross-mixin contracts
- Add _require(val, msg) helper for narrowing T | None → T
- Inherit _MixinBase in all 26 leaf mixin classes
- Wrap SQLAlchemy column method calls (.is_(), .like(), .notin_(), .in_(),
  .contains()) with col() from sqlmodel — fixes attr-defined false positives
  caused by pydantic plugin typing class-level fields as Python value types
- Wrap select(Model.field) with select(col(Model.field)) for column projections
- Add pyproject.toml [[tool.mypy.overrides]] to disable arg-type in
  sqlmodel_repo.*: pydantic plugin resolves .where(Model.field == v) as
  where(bool), a false positive; call-arg still catches real argument errors
- Remove 9 stale # type: ignore comments (logging, helpers, credentials)
- Fix telemetry.py traced() overload no-redef + misc
- Fix logs.py datetime/str operator and nullable PK comparison with col()
- sqlmodel_repo/ now has 0 mypy errors
2026-05-01 00:49:18 -04:00
d777a1c4e0 chore(types): P1 — mark all P1 items complete in STATIC-TYPES.md 2026-05-01 00:23:30 -04:00
9cf7bc5aab fix(types): P1 — remove 2 stale type: ignore[no-untyped-def] in clustering adapters 2026-05-01 00:22:33 -04:00
5a240a3d55 fix(types): P1 — widen _send_json data param to dict | list in elasticsearch server 2026-05-01 00:22:16 -04:00
05cdd72d51 fix(types): P1 — annotate ranges: list[Range] in geoip/rir and asn/iptoasn providers 2026-05-01 00:21:44 -04:00
6f8f2ed573 fix(types): P1 — type: ignore[abstract] for plugin-discovery cls() call in registry 2026-05-01 00:21:09 -04:00
23caa86266 fix(types): P1 — pydantic.mypy plugin, types-PyYAML stub, pin mypy<1.20 2026-05-01 00:20:54 -04:00
909913e912 fix(types): P0 mypy — explicit binascii import, drop dead or None in ntlmssp
syslog_bridge.py: base64.binascii is not a public mypy-visible attribute;
import binascii directly and reference binascii.Error at the except clause.
Propagated to all 26 template subdirectory copies (all were drift-free).

ntlmssp.py: `principal = username or None` widened the type to str | None
for no runtime reason — _decode_str() always returns str.  Drop the `or None`.
Propagated to smb/ and rdp/ copies.

762 → 722 mypy errors (-40).
2026-05-01 00:09:00 -04:00
fc1f0914b7 refactor(topology): introduce TopologyRepository protocol with DTO return types
Replace repo: BaseRepository with a structural TopologyRepository protocol
in persistence.py and allocator.py. All read methods now return typed DTOs
(TopologySummary, LANRow, DeckyRow, EdgeRow) instead of raw dicts, eliminating
silent field-shape regressions across the topology subsystem.

TopologySummary gains email_personas and language_default so api_personas.py
can continue reading those fields via attribute access. hydrate() converts
DTOs to dicts before passing to _backfill_decky_configs, keeping the mutable
working-state function dict-based at its boundary. All production callers
(router handlers, mutator, CLI, heartbeat) migrated from dict/get access to
attribute access. 134 tests pass.
2026-04-30 23:51:41 -04:00
3456d3ab45 fix(models): Literal types on topology enum fields, hoist _MUTATION_OPS, top-level json import
MutationRow.op was str despite _MUTATION_OPS existing; Topology.mode/status,
TopologyDecky.state, TopologyMutation.op/state carried valid values only in
comments; deferred json import had no justification.

- Promote _MUTATION_OPS before table classes so table fields can reference it
- Add sa_column=Column(String) on each Literal-annotated table field to satisfy
  SQLModel 0.0.38 column-type inference
- Move import json to module top; remove deferred import inside _decode_json_payload
- MutationRow.op: str -> _MUTATION_OPS
2026-04-30 23:17:24 -04:00
3cb0203d07 fix(frontend): layout viewport selector, credentials tab order, Attackers.css shared import 2026-04-30 22:16:46 -04:00
eb34d0b1ea fix(event_kinds): remove probe_forwarded from INTERACTION_EVENT_TYPES 2026-04-30 22:16:11 -04:00
78d3e3a6b9 refactor(auth): hoist _CREDENTIALS_EXCEPTION constant, tighten JWT dependency chain 2026-04-30 22:16:06 -04:00
0b5228eb94 feat(config): add swarmctl-host to INI, env, CLI; drop hardcoded bind from systemd unit
[swarm] swarmctl-host → DECNET_SWARMCTL_HOST so operators set the bind
address once in decnet.ini; `decnet swarmctl` and the systemd unit both
resolve it via envvar — no --host/--port pinned on ExecStart.
2026-04-30 22:16:00 -04:00
57fecb8071 refactor(frontend): ApiError interface, tempIdSuffix rename, NET_GRID constants, extract onPaletteDrop handlers
ApiError: defined once in utils/api.ts, replaces 9 ad-hoc anonymous casts
across MazeNET, Inspector, DeckyFleet, SwarmHosts, Webhooks, PersonaGeneration,
ServiceConfigFields, CanaryTokens.

hex4 renamed to tempIdSuffix — the name now matches the comment that already
explained its purpose.

NET_GRID_{W,H,GAP,COLS} extracted from inline magic numbers to module-level
constants in MazeNET.tsx.

onPaletteDrop (130-line useCallback) split into three module-level handlers
(_dropNetwork, _dropArchetype, _dropService); the callback becomes a 10-line
router.
2026-04-30 22:14:20 -04:00
b754e9aa8b refactor(validate): move forwards_l3 overload explanation into check docstring
The 17-line block comment at _RULES was prose covering for a design wart.
The explanation belongs on the function itself — moved there and condensed.
_RULES now has a 2-line pointer instead of an essay.
2026-04-30 22:10:41 -04:00
402d6584ba fix(topology_store): use sqlite3.Row for named column access in current()
Row unpacking by positional index breaks silently on schema changes.
row_factory = sqlite3.Row gives named access with zero overhead.
2026-04-30 22:09:51 -04:00
9ad62d8177 fix(compose): name the topology_id prefix length constant
topology_id[:8] appeared twice with no explanation. 8 chars is the
git short-SHA convention; collision-safe within a single deployment's
network namespace.
2026-04-30 22:09:26 -04:00
eb7ccd0006 fix(reuse_worker): remove noqa: BLE001 (rule not in ruff select)
fix(generator): correct service pool count in _SVC_MIN/_SVC_MAX comment

BLE001 is not in ruff.toml select (F/ANN/RUF/E/W only); the suppressions
were whispering apologies to a linter that wasn't listening. Generator
comment now cites the actual ~28-entry non-singleton service pool.
2026-04-30 22:06:44 -04:00
17480093a9 refactor(topology_ops): decompose apply() into focused helpers
apply() was an 85-line function handling hash verification, validation,
superseding teardown, bridge/compose provisioning, and store persistence.
Extracted _check_hash_and_validate(), _teardown_superseded(), and _materialise()
so each step is independently readable and testable.
2026-04-30 21:56:48 -04:00
d1ed2701e7 refactor(generator): promote nested functions; rename used_combos to seen_service_pairs
_take_ip and _new_decky were closures capturing outer-scope state. Promoted to
module-level with explicit parameters. seen_service_pairs name makes the intent
clear — it prevents the same service frozenset from being assigned repeatedly.
2026-04-30 21:53:45 -04:00
07e6bafff8 fix(validate): narrow bare except to ImportError in psutil port-collision check
The original except Exception silently disabled port collision detection for
any runtime error — not just a missing package. Now only ImportError degrades
gracefully; real psutil failures propagate.
2026-04-30 21:53:05 -04:00
84e0ac4a43 fix(topology): cache IPAllocator host set; type repo params as BaseRepository
_host_set is computed once in __init__ — reserve() and is_free() were rebuilding
the full host frozenset on every call. BaseRepository already existed; the Any
annotations were just never updated.
2026-04-30 21:52:29 -04:00
257857338c fix(api): replace threading.Lock with asyncio.Lock for hydration guard
await inside a threading.Lock yields to the event loop while the OS
thread still holds the lock — potential deadlock under FastAPI thread
pool dispatch. asyncio.Lock is the correct primitive for async
critical sections. Also fixed stale diurnal.py docstring that had the
delegation direction backwards.
2026-04-30 21:24:11 -04:00
3fce597a70 docs(bodies): document intentional shared _body_canary in dispatch table 2026-04-30 21:19:07 -04:00
2629a8a0de fix(fake): rename prompt to _prompt, drop noqa suppression 2026-04-30 21:18:55 -04:00
a8c69155ff fix(planner): surface dropped weight entries in PUT /realism/config response
_parse_weights was silently dropping content_class values that don't
belong on their target list with no operator feedback. Changed it to
return (weights, dropped), apply_payload to collect and return all
dropped names, and put_config to include dropped_entries in the
response when non-empty.
2026-04-30 21:18:41 -04:00
8a40f6ced0 fix(personas_pool): re-stat after read to avoid caching stale mtime
The initial stat and read happened without a lock between them. A file
change mid-window stored the mtime of the pre-change stat against the
post-change content, suppressing the next reload. Re-stat after
read_text; fall back to the pre-read stat only on OSError.
2026-04-30 21:17:50 -04:00
1e1c92abc3 fix(bodies): type make_body_with_llm persona parameter via TYPE_CHECKING
The persona arg was typed Any to avoid a circular import. Added a
TYPE_CHECKING guard to import EmailPersona annotation-only so mypy
has the type without a runtime import cycle.
2026-04-30 21:17:26 -04:00
ebe15310ab fix(api): hydrate planner from DB exactly once on first GET, not on every read
get_config was calling planner.apply_payload on every GET request, racing
concurrent reads on module-level globals. Added a _hydrated flag + lock
so DB hydration runs at most once per process lifetime; put_config marks
it done too. Test fixture resets the flag between tests.
2026-04-30 21:17:03 -04:00
c7fcd86be4 fix(planner): guard apply_payload and reset_to_defaults with a lock
Concurrent PUT requests could observe a half-updated planner between
the four sequential global assignments. Added _planner_lock so the
rebind is atomic; same lock wraps reset_to_defaults.
2026-04-30 21:15:12 -04:00
f597d70430 fix(realism): use minute-precision datetime in in_active_hours
personas.in_active_hours was discarding the minute component of the
active-hours window, making "09:30-17:45" behave as "09:00-17:00".
Rewrote it to delegate to diurnal.in_work_hours (which uses full
minute arithmetic) and updated the scheduler caller to pass the full
datetime instead of now_dt.hour.
2026-04-30 21:14:36 -04:00
f6422f2529 fix(heartbeat): replace remaining bare except Exception with SQLAlchemyError and typed builtins 2026-04-30 21:08:26 -04:00
542d129d6f refactor(services_live): replace string-sniffed error dispatch with typed exception subclasses
ServiceNotFoundError (→ 404) and ServiceConflictError (→ 409) replace the
"not found" / "already on" / "not on" substring checks in _map_mutation_error;
base ServiceMutationError still maps to 422. Fixes three pre-existing test
status-code assertions (201 vs 200 on POST endpoints).
2026-04-30 20:49:29 -04:00
a5487eb55f refactor(enroll-bundle): extract bundle_builder and move DTOs to swarm models
Pure tarball construction (_build_tarball, _render_*, _iter_included,
_SYSTEMD_UNITS) moved to decnet/swarm/bundle_builder.py — no FastAPI
dependency, independently testable. EnrollBundleRequest/Response moved
to decnet/web/db/models/swarm.py alongside the other swarm DTOs.
Router drops from 504 to 260 lines; keeps only the in-memory token
registry, sweeper, and endpoints.
2026-04-30 20:39:42 -04:00
e124f9e296 refactor(swarm): extract _shard_payload helper and promote _dispatch to module-level 2026-04-30 20:25:38 -04:00
c648d8b04e fix(heartbeat): replace bare except Exception with specific types and intent comments 2026-04-30 20:19:52 -04:00
72498f81b2 fix(ui): surface attacker date_hdr in mail table and drawer
MailDrawer was reading fields.date / from_addr / message_id —
all wrong; actual log field names are date_hdr, from_hdr,
message_id_hdr, to_hdr.  The mail table in AttackerDetail
showed only DECNET capture time and used from_addr instead
of from_hdr.  Add a DATE (attacker) column so the attacker-
supplied Date header (including timezone) is visible at a
glance — useful for correlating campaigns like the Tiscali
run where IPs used distinct TZs (+0800 vs -0700).
2026-04-30 14:11:08 -04:00
d0b07bdf52 fix(smtp_relay): inject From: header if absent so attacker address shows in client
Relay-test scripts send minimal DATA with no headers. Without a From:
header the mail client falls back to displaying the envelope sender
(upstream_sender). Inject From: <attacker MAIL FROM> before forwarding
when the message has no existing From: header.
2026-04-30 12:43:41 -04:00
4d12fb6a03 fix(smtp_relay): upgrade to STARTTLS before AUTH if server advertises it
Servers like mail.resacachile.cl only expose AUTH after STARTTLS. Issue
starttls() + re-ehlo() when the server advertises the extension.
2026-04-30 12:40:17 -04:00
633594b110 fix(smtp_relay): use correct async-for bus subscription in probe listener
bus.subscribe() is sync and returns an async iterator, not a coroutine.
Awaiting it caused an immediate crash at startup; bus.next_message() does
not exist either. Rewrote _run_smtp_probe_listener to use the standard
pattern: sub = bus.subscribe(...) / async with sub / async for event in sub.
2026-04-30 12:35:45 -04:00
761c23a07c fix(smtp_relay): emit service=smtp_relay in syslog so ingester can gate probe publish
SERVICE_NAME was hardcoded to 'smtp' in server.py; the ingester's probe
publish guard checked service == 'smtp_relay' and never matched.

Read SMTP_SERVICE_NAME from env (default 'smtp'); smtp_relay compose
fragment sets it to 'smtp_relay' so the two services are distinguishable.
2026-04-30 12:31:29 -04:00
f0d47c5195 fix(smtp): chmod quarantine dir before dropping to logrelay
The bind-mounted quarantine dir is owned by the host decnet user; the
logrelay process had no write access because the Dockerfile USER directive
pre-applied before the entrypoint could fix permissions.

Run entrypoint as root, chmod 0777 the quarantine dir, then exec the
server under logrelay via su.
2026-04-30 12:25:37 -04:00
8ae7b9636e feat(smtp_relay): move probe forwarding to realism worker via bus
Attacker probe emails are now forwarded by the master (realism worker)
rather than inside the MACVLAN container, which has no internet gateway.

- New smtp.probe.pending bus topic: ingester publishes when smtp_relay
  message_stored fires; worker subscribes and does the actual delivery
- decnet/orchestrator/drivers/smtp_relay.py: pure-sync forward_probe()
  reads the .eml from disk and sends via smtplib on a thread executor
- worker.py: _run_smtp_probe_listener + _handle_probe_pending subtask;
  limit enforced via count_probe_relays() (DB-backed, restart-safe)
- bounties.py: count_probe_relays() query on probe_relay bounty type
- fleet.py: get_fleet_decky_by_name() to pull service config from DB
- services/smtp_relay.py: upstream_* and probe_limit fields defined in
  config_schema but NOT injected into container env (credentials stay
  out of docker env vars)
- ingester.py: stripped of smtplib; publishes probe.pending and exits
- tests: assert upstream keys absent from container environment
2026-04-30 12:10:58 -04:00
4c0a1309f0 fix(smtp_relay): log upstream error reason in probe_forwarded event
forwarded=0 was silent — now fwd_error carries the exception string so
you can see exactly why the upstream refused (auth failure, connection
refused, timeout, etc).
2026-04-30 11:57:07 -04:00
c78ba6f698 fix(deploy): pre-remove container by name before force-recreate
Docker Compose tracks the previous container by internal ID. When that
container was already removed or renamed, --force-recreate fails with
"No such container". Remove by name first so Compose always starts clean.
2026-04-30 11:54:00 -04:00
fdf38a9d8c feat(smtp_relay): add upstream_sender to fix SPF on probe forwarding
Override the envelope MAIL FROM with a domain we own when talking to the
upstream relay. SPF passes at the recipient; the attacker's From: header
inside the message body is untouched so they see their own address in their
inbox and believe the relay is real.
2026-04-30 11:47:18 -04:00
24cdef9246 feat(smtp_relay): ingest probe_forwarded as probe_relay bounty
Adds probe_forwarded to meaningful event kinds and stores it in the
bounty table as bounty_type=probe_relay with forwarded=true/false, so
the dashboard shows whether the upstream actually accepted the test email.
2026-04-30 11:32:14 -04:00
9a4fe2677b feat(smtp_relay): forward probe emails upstream so attackers verify relay works
First SMTP_PROBE_LIMIT messages per source IP are forwarded via a real
upstream relay (SMTP_UPSTREAM_HOST/PORT/USER/PASS) so the attacker's
test email actually lands in their inbox. All subsequent messages from
the same IP get 250 Ok but only hit the quarantine — campaign content
captured, nothing delivered.
2026-04-30 11:21:04 -04:00
4b7cb42ab1 fix(profiler): extract commands when MSGID=command, not just MSGID=NIL
The Dockerfile PROMPT_COMMAND logger uses --msgid command, so the MSGID
field arrives as 'command' not '-'. The CMD rewrite block was guarded by
event_type == '-' so it never fired, leaving fields['command'] unpopulated
and cmd_text=None for every SSH session command.

Broaden the guard to also match event_type == 'command' with no existing
'command' field, which covers both the intended (MSGID=NIL) and actual
(MSGID=command) wire formats.
2026-04-30 10:57:29 -04:00
bbb1762250 fix(export): one attacker per line in exported JSON 2026-04-30 10:45:03 -04:00
2ddba04f79 feat(attackers): add JSON export endpoint and download button 2026-04-30 10:43:46 -04:00
f0756dcdec fix(ui): use overflow: clip on dash panels so inner scrollbars aren't masked 2026-04-30 00:34:40 -04:00
18393f1e1c fix(ui): bound dashboard height so panels don't overflow viewport
.content-viewport is overflow-y: auto so flex:1 on dash-grid grew to
content height. Fix: dashboard uses height:100% instead of min-height,
and :has(>.dashboard) disables content-viewport scroll only on that
route — all other pages keep their normal scroll.
2026-04-30 00:32:16 -04:00
9ed0094045 fix(ui): reset live feed scroll to top on log update
Sticky thead was floating mid-content when the container auto-scrolled
as new log entries arrived. Pinning scrollTop to 0 on each logs update
keeps the thead at position 0 where it belongs.
2026-04-30 00:30:46 -04:00
fca0953439 fix(ui): dashboard grid fills available viewport height
Use flex: 1 on dash-grid instead of height: 480px so the panels
consume all remaining space below the stat cards; dash-side uses
height: 100% to fill its grid cell
2026-04-30 00:27:47 -04:00
b364c41736 fix(ui): dashboard panel heights + missing icon
- Use height: 480px on .dash-grid so both columns are the same height;
  side panels split that height via flex instead of their own max-height
- Add LayoutDashboard icon to the DASHBOARD page header
2026-04-30 00:24:27 -04:00
fbc9877ef2 fix(ui): follow-up polish — icons, dashboard bar, filter redesign, bounty/creds sort
- Dashboard: fix invisible bar at bottom of LIVE FEED by constraining
  max-height on the section instead of the inner container; same fix
  for side panels
- Page icons: add violet-accent icon beside h1 on all 9 missing pages
  (CanaryTokens, RealismConfig, SyntheticFiles, PersonaGeneration,
  Attackers, Webhooks, LiveLogs, Topologies, DecoyFleet)
- Attackers filter chips: replace ad-hoc chip buttons with seg-group
  tabs (ALL / ACTIVE N / PASSIVE N / INACTIVE N) matching Credential
  Vault style; country chips use same seg-group treatment
- Credential Vault: add sortable headers to REUSE tab (LAST SEEN,
  PRINCIPAL, KIND, TARGETS, ATTEMPTS); reuses same SortTh pattern
- Bounty: remove CREDENTIALS and PAYLOADS tabs; keep ALL, ARTIFACTS,
  FINGERPRINTS; add EMAIL (artifact subtype, filtered client-side)
2026-04-30 00:20:25 -04:00
9adee07d21 feat(ui): frontend polish sweep — 8 UX fixes
- DeckyFleet: card click opens inspect side-drawer instead of
  auto-filtering (localSearch filter behavior removed)
- Dashboard: LIVE FEED / DECKIES UNDER SIEGE / TOP ATTACKERS panels
  now have fixed max-height with overflow scroll instead of growing
- parseEventBody: defensive RFC 5424 header strip so raw syslog lines
  from the collector render as k=v pills instead of raw text
- Attackers: search placeholder updated; activity (Active/Passive/
  Inactive) and country chip filters added on top of existing IP search
- Credentials + Bounty: sortable column headers (click to asc/desc/clear)
- SwarmHosts + RemoteUpdates: icon extracted from <h1> into flex div
  with violet-accent class, matching site-wide Identities pattern
- Swarm.css: fix --panel-border undefined variable → --border so the
  title border-bottom line is visible on SwarmHosts and RemoteUpdates
2026-04-29 23:56:38 -04:00
a322d88b3c fix(tarpit): resolve topology container name in watcher before PID lookup 2026-04-29 21:14:21 -04:00
917f7e8e54 feat(tarpit): MazeNET topology-scoped tarpit — Inspector controls + topology API 2026-04-29 21:10:02 -04:00
f84c66cf9b feat(ui): tarpit controls on DeckyCard — three-dot dropdown + enable/disable 2026-04-29 20:56:51 -04:00
07b32e2abe fix(tests): patch add_service/remove_service at the router import, not the module
Monkeypatching services_live.add_service had no effect because api_services
already held a local reference to the name. Patch api_services.add_service
and update fake stubs to accept the config kwarg added to the real signature.
2026-04-29 18:50:21 -04:00
5f4005c47a feat(tarpit): port-selective tc netem tarpit mode with live log events
- GET/POST/DELETE /api/v1/deckies/{name}/tarpit (admin write, viewer GET)
- get_container_veth() + get_container_pid() in network.py via iflink/ip-link
- TarpitRule SQLModel table + TarpitMixin repo (upsert/get/delete/list)
- Background tarpit_watcher_worker: polls /proc/{pid}/net/tcp every 15s,
  emits tarpit_enter/tarpit_exit log events (edge-triggered, with duration)
- tarpit_enabled/tarpit_disabled logs on operator POST/DELETE actions
2026-04-29 18:49:42 -04:00
2fc5f1bdc5 feat(canary): auto-deregister fingerprint slug after first valid beacon
Once a fingerprint canary's HTTP beacon passes all 4 validation layers
and the trigger row lands, the token is immediately set to state=revoked
and canary.<id>.revoked is published on the bus. The slug lookup is
tightened to only return planted tokens, so subsequent requests to the
same URL silently return the transparent GIF without persisting anything
(stealth posture preserved). Plain http/dns canaries with no
fingerprint_nonce are not affected.

Changes:
- sqlmodel_repo/canary.py: add state == "planted" filter to
  get_canary_token_by_slug so revoked slugs resolve to None
- worker.py: after record_canary_trigger, if parsed_fp survived all
  layers and token has a fingerprint_nonce, call
  update_canary_token_state("revoked") + publish CANARY_REVOKED; errors
  are best-effort (trigger row already landed)
- test_worker_http.py: assert state=revoked in test_fp_valid_nonce_persists;
  new test_fp_deregisters_slug_after_valid_hit (second hit records nothing);
  new test_plain_http_canary_not_deregistered (env_file stays planted)
2026-04-29 17:49:31 -04:00
b26dd8f529 feat(canary): API-trashing defense — 4-layer fingerprint validation
Adds per-mint nonce gating, structural shape validation, mint UUID
consistency checks, and a per-(token, IP) rate limiter to the canary
worker so attackers who extract a canary from a decky filesystem cannot
poison fingerprint forensics by replaying or forging ?d= submissions.

Changes:

base.py
  fingerprint_nonce: Optional[str] added to CanaryArtifact so generators
  can surface the nonce to the cultivator without coupling the generator
  directly to DB code.

obfuscator.py
  nonce_for(callback_token, mint_uuid): HMAC-SHA256 keyed on
  DECNET_CANARY_FINGERPRINT_SECRET, truncated to 16 hex chars.
  FingerprintSecretMissing raised at mint time if env var is unset.
  render_fingerprint_js() now accepts nonce= and substitutes MINT_NONCE.

fingerprint_payload.js
  New MINT_NONCE placeholder. Appended as &k= on all beacon URLs (bare-open,
  single-shot, chunked). Using &k= avoids colliding with &n= (chunk total).

fingerprint_html.py / fingerprint_svg.py
  Derive nonce via nonce_for() and pass to render_fingerprint_js(). Set
  artifact.fingerprint_nonce so the cultivator can persist it.

cultivator.py
  Passes fingerprint_nonce into create_canary_token() when present on the
  artifact; NULL for all non-fingerprint generators.

canary.py (model)
  fingerprint_nonce: Optional[str] = Field(default=None, max_length=16)
  added to CanaryToken. None for non-fingerprint tokens.

worker.py
  _extract_fingerprint now returns (meta_dict, parsed_fp) tuple.
  _record_hit accepts parsed_fp + raw_nonce and runs 4 layers after
  token lookup: nonce match, shape check, mint UUID consistency, rate limit.
  Each failure sets _fp_invalid_* flag and drops structured _fp.
  Trigger row always lands regardless.

tests/canary/conftest.py
  Session-scoped autouse fixture sets DECNET_CANARY_FINGERPRINT_SECRET so
  fingerprint generator and worker tests work offline.

tests
  5 new worker HTTP tests and 2 new generator tests covering each
  validation layer.
2026-04-29 17:41:04 -04:00
f86dc79990 feat(canary): ship Node helper with wheel + install-toolchain CLI
The fingerprint canaries' obfuscator shells out to a Node helper that
require()s javascript-obfuscator. Without this commit, a fresh
pip install decnet would land the .py modules but not the .js helper /
package.json, and there'd be no documented way to provision Node side.

* pyproject.toml - extend tool.setuptools.package-data to ship
  canary/_obfuscate_helper.js, canary/fingerprint_payload.js, and
  canary/package.json with the wheel.
* decnet/cli/canary.py - new "decnet canary-install-toolchain"
  subcommand. Resolves decnet.canary.__file__'s dir, runs
  npm install --omit=dev there, exits non-zero with a clear message
  if npm is missing or install fails. Idempotent - safe to call
  every API service start.
* deploy/decnet-api.service.j2 - non-fatal ExecStartPre that calls
  the new subcommand. Leading '-' so a missing Node toolchain only
  degrades fingerprint canaries (loud at mint time) without keeping
  the API from booting.
* tests/canary/test_cli.py - registration smoke test, missing-npm
  exit path, and a mocked-subprocess test asserting the right argv
  and cwd land on npm.

Realism cultivator already has a broad except Exception around
cultivate() in scheduler.py:195-211, so a missing toolchain on a
host running the realism tick degrades to an inert noise file with
no extra plumbing.
2026-04-29 16:53:27 -04:00
907ade9142 feat(realism): wire fingerprint_html/svg through taxonomy + UI
The two new fingerprint canary generators existed at the API level
since f64e78f but weren't visible to the realism engine or the
operator-facing dashboard. Threads them through every place that
enumerates canary content classes.

Backend:
* realism/taxonomy.py - two new ContentClass members
  (CANARY_FINGERPRINT_HTML, CANARY_FINGERPRINT_SVG); enum is
  wire-visible (synthetic_files.content_class column + bus discrim)
  so we add at the bottom, never reorder.
* canary/cultivator.py - class-to-generator dispatch, kind mapping
  (both http), and default placement paths
  (~/Documents/asset_directory.html and network_topology.svg).
* realism/naming.py + bodies.py - _name_canary / _body_canary entries.
* realism/planner.py - added to _DEFAULT_CANARY_CLASS_WEIGHTS and
  the _CANARY_CLASSES classification set.

Frontend:
* decnet_web/src/realism/labels.ts - display labels.
* decnet_web/src/components/RealismConfig/RealismConfig.tsx - default
  canary weight rows so operators see them in the realism config UI.
* decnet_web/src/components/SyntheticFiles/SyntheticFiles.tsx - added
  to the CONTENT_CLASSES allow-list so filter dropdowns show them.

Also: re-applied the nosec B404/B603 markers on canary/obfuscator.py;
the first commit's pre-commit autoformatter stripped them.

Tests: extended tests/realism/test_taxonomy.py's stability assertion
to include the two new values. Full canary + realism suites pass
(362 / 2 skipped).
2026-04-29 16:44:03 -04:00
de6d5cd1a8 fix(canary): include fingerprint_* in KNOWN_GENERATORS stability test 2026-04-29 16:26:09 -04:00
dd807bc55e feat(canary): worker decodes ?d=/?o=/?s=&i=&n=&d= fingerprint params
The fingerprint payload beacons fingerprint data as base64url JSON in
GET query params: ?o=1 for the bare-open beacon, ?d=<blob> for a
single-shot dump, or ?s/i/n/d=<chunk> for chunked dumps. Until now
those params were buried inside request_path; consumers had to parse
the URL themselves.

Worker now extracts them in _extract_fingerprint and merges into
raw_headers under reserved _fp* keys:

* _fp_open       — bare-open marker
* _fp            — decoded fingerprint dict (single-shot path)
* _fp_sid/idx/total/chunk — chunked metadata + raw base64 (reassembly
  is a downstream concern, not the worker's job)
* _fp_decode_error / _fp_oversize — failure markers for trash dumps

Per-chunk size capped at 8KB so an attacker spamming /c/<known_slug>
can't inflate trigger rows indefinitely. Decode failures degrade
gracefully — the trigger row still records the hit, just with a
_fp_decode_error flag instead of structured fingerprint data.

Tests cover the single-shot decode, bare-open flag, chunked metadata,
malformed input, and oversize drop paths.
2026-04-29 16:25:17 -04:00
f64e78f78c feat(canary): fingerprint_html + fingerprint_svg generators
Two new synthesised-artifact generators that bake the obfuscated
fingerprint payload into plausible-looking decoy files:

* fingerprint_html — a mundane "Internal Asset Directory" page with a
  small table of fake hosts; the obfuscated payload is inlined at the
  bottom of <body>. Visible content (row pool slice, sync timestamp)
  also varies per mint via SHA-256-derived stable ints, so two
  extracted canaries don't diff to zero even on the rendered surface.
* fingerprint_svg — standalone SVG with an embedded <script> CDATA
  block. SVG <script> only fires for top-level loads / <object> /
  <iframe>; <img>-referenced renders are safely inert.

Both derive the mint UUID via uuid.uuid5 from the callback token, so
re-mints are byte-identical (preserving the generator determinism
contract) AND the same token produces the same mint UUID across HTML
and SVG variants — the worker can correlate beacons across artifact
shapes.

Wired into the factory + KNOWN_GENERATORS, default placement paths
under ~/Documents/asset_directory.html and ~/Documents/network_topology.svg
for both linux and windows personas. Tests cover determinism, per-token
divergence, structural validity (DOCTYPE/SVG headers), and that the
beacon URL stays inside the obfuscated string array (not in plaintext).
The two new entries skip in test_generators.py when Node toolchain is
absent so bare CI checkouts still pass.
2026-04-29 16:22:18 -04:00
12cd7ad9cb feat(canary): per-mint JS obfuscator wrapper + fingerprint payload
Adds the load-bearing primitives for obfuscated browser-fingerprinting
canaries. Step 3 (HTML/SVG generators) and step 4 (worker-side
fingerprint ingestion) build on top of these.

* decnet/canary/obfuscator.py - javascript-obfuscator wrapper. Seed
  and polymorphic config bits both derive from the callback token, so
  output is byte-identical for the same mint (preserving the generator
  determinism contract from base.py) and structurally distinct across
  mints.
* decnet/canary/fingerprint_payload.js - port of canary-self-test.html
  with the rendering UI stripped. Two placeholders (BEACON_URL,
  MINT_UUID) substituted before obfuscation. MVP beacon strategy:
  bare-open GET pixel first, then base64url-encoded fingerprint as
  query params on subsequent GETs (chunked above ~6KB) so the existing
  worker records hits before step-4 lands.
* decnet/canary/_obfuscate_helper.js - Node subprocess helper that
  reads code+options JSON from stdin and writes obfuscated JS to
  stdout. Vendored javascript-obfuscator under decnet/canary/.
* tests/canary/test_obfuscator.py - determinism, per-mint divergence,
  template substitution, Node syntax check, error path.
2026-04-29 16:16:37 -04:00
eefab020d4 fix(swarm): propagate service mutations to worker agent via shard re-dispatch
Add/remove/update_config on a fleet decky living on a swarm worker — and on
an agent-pinned topology — used to run the master's local docker-compose only,
which has no containers for the remote decky. The mutation persisted on master
and silently no-op'd on the worker.

- Fleet swarm: lookup DeckyShard.host_uuid; if found, rebuild a single-host
  shard from master state and call dispatch_decnet_config — same proven path
  as POST /swarm/deploy. Skip local _compose (no containers to touch).
- Topology agent-pinned: call decnet.engine.deployer.resync_agent_topology
  (existing helper) to push the latest hydrated blob to the worker.
- Local-only deckies: behaviour unchanged.
- Tests: 5 new in tests/engine/test_services_live_swarm.py covering all
  three mutations on a swarm fleet decky (no local _compose, dispatch fires
  with the right host's deckies), plus apply=False save-only path (no
  dispatch), plus regression that local-only fleet add still runs local compose.

Bus signal `decky.{name}.service_config_changed` keeps publishing as an
audit trail; it is not the propagation trigger.
2026-04-29 12:51:16 -04:00
94b06ee862 feat(services): initial config on ADD SERVICE — schema modal in DeckyCard, MazeNET drag, and Inspector
- DeckyServiceAddRequest gains an optional `config: dict` field, validated
  against the service's config_schema before any state mutation (400 on
  bad type, no half-written rows).
- Engine: add_service threads `config` into _add_topology_service /
  _add_fleet_service, persisting validated cfg to decky_config.service_config
  BEFORE compose regen so the first `up -d --build` materialises the env on
  the new container. No follow-up apply needed.
- Frontend: shared AddServiceConfigModal — same wizard accordion shape, used by:
    * DeckyCard's ADD SERVICE picker (Fleet & MazeNET inspectors via shared component)
    * MazeNET Inspector's ADD SERVICE picker
    * MazeNET palette drag-drop onto a deployed decky
  Empty-schema services short-circuit to a one-click add (no modal flash).
  Operator can cancel; errors surface in the modal.
- Tests: add_service config plumbing — persist, drop unknown keys, 400-equivalent
  on bad types, back-compat empty-config.
- Drive-by: fix stale repo-method names in test_services_live.py
  (create_topology_decky → add_topology_decky, get_topology_decky → list+pick helper,
  service.added → service_added topic).
2026-04-29 12:44:47 -04:00
77ceb9d6f3 feat(services): config schemas for the rest of the registry + textarea base64 transport
- Declarative config_schema on RDP, Telnet, MySQL, Redis, SMTP, SMTP_Relay
  matching the keys each service already reads at compose time.
- TODO marker on the 19 services that accept service_cfg but never read it,
  so future contributors know where to plug schemas in.
- Wizard base64-wraps all textarea values at INI emit (DeckyFleet
  buildIni); validate_cfg detects the b64: sentinel and decodes back to
  UTF-8. Plain raw strings still pass through for direct API submitters.
- HTTPS image entrypoint accepts PEM content or path in TLS_CERT/TLS_KEY:
  detects a BEGIN header, writes content to /opt/tls/, and re-exports
  the on-disk path so server.py keeps reading paths.
- Tests cover schema/compose alignment for each new service plus
  textarea base64 round-trip (incl. UTF-8) and HTTPS PEM end-to-end.
2026-04-29 12:23:56 -04:00
d8fa7cc73d feat(ui): per-service config in the deploy wizard's CONFIGURATION step
Setting a password, banner or TLS material AFTER deployment forces a
container recreate on every change. The deploy wizard now lets the
operator set service config up-front so the initial build has the
right env from the start.

Mechanics:
- Extracted the schema-driven field rendering out of ServiceConfigForm
  into a standalone ServiceConfigFields component (no API/buttons,
  just inputs + onChange).  ServiceConfigForm now delegates to it.
- Wizard step 2 (CONFIGURATION) renders one accordion block per
  selected service; clicking a service reveals its schema-driven
  inputs and a 'N set' badge tracks how many overrides are populated.
  Removing a service (back to step 1) drops its config so the INI
  doesn't carry orphans.
- _buildIni emits one [<prefix>.<svc>] group subsection per service
  with at least one override.  The INI loader's prefix-matcher
  applies it to every ${prefix}-NN decky in the batch, so one block
  covers all clones.
- Multi-line string values (PEM textareas etc.) are escaped as \n
  on the way into INI; downstream consumers re-expand.
2026-04-29 12:08:17 -04:00
97260daf8d fix(ui): make .info-banner usable inside the deploy-wizard modal
PersonaGeneration.css scopes .info-banner under .persona-gen-root,
which doesn't match elements rendered inside the Modal portal —
so the wizard's CONFIGURATION-step banner I just added rendered
as plain text.

Add a page-unscoped .info-banner rule in DeckyFleet.css with the
same visual treatment (faint bg, violet left rule) so any modal
context picks it up.
2026-04-29 12:01:42 -04:00
8d3f5c646a fix(network): accept CAP_NET_ADMIN in lieu of euid==0 for macvlan setup
The systemd unit grants AmbientCapabilities=CAP_NET_ADMIN so the API
service can program host-side macvlan/ipvlan interfaces without
running as root, but setup_host_macvlan/_ipvlan rejected with euid!=0
before even trying — making web-driven 'decnet deploy' impossible
under the privilege model the unit advertises.

Replace _require_root with _require_net_admin, which reads CapEff
from /proc/self/status and accepts the cap (bit 12) as well as
euid==0. No libcap dep — pure /proc parse.
2026-04-29 11:56:40 -04:00
5912608f78 fix(ui): wizard CONFIGURATION step + drop bogus --archetype custom preview
The CONFIGURATION step had a stale disabled placeholder textarea
("per-service overrides") from before the schema-driven Inspector
landed. Replaced with a one-line info banner pointing at the Inspector,
which is now where per-service config actually lives.

The DEPLOY step's CLI preview was rendering '--archetype custom' when
pickMode==='services', but no such archetype is registered — only the
preset archetypes plus 'services' (free-form list). Drop the
--archetype line entirely in the services-mode preview so the rendered
command reflects what the API actually receives.
2026-04-29 11:56:29 -04:00
ba0e7ca476 style(ui): rebuild ServiceConfigForm in inspector terminal vocabulary
Previous CSS lived in DeckyFleet.css only, so when the form rendered
inside MazeNET Inspector the inputs fell back to browser defaults
(white-on-white, oversized labels, mismatched buttons).

New ServiceConfigForm.css ships with the component itself: small
uppercase tracking-1 labels at 0.6rem (matches kvs .k), dark
transparent inputs with violet focus, matrix-green text inside
inputs, custom select chevron, dedicated svc-cfg-btn that visually
mirrors maze-btn.small, password reveal toggle, and a 96px label
column so labels never wrap into the input. Help text drops to
0.58rem dim under the input. Works identically in both surfaces.
2026-04-29 11:50:35 -04:00
e51666ee14 fix(ui): stop ServiceConfigForm from re-fetching schema every render
The schema useEffect depended on currentConfig, which the parent
passes as a fresh `{}` literal on every render — referentially new
each time, so the effect re-ran and the GET /services/.../schema
hammered the server.

Schema fetch now only depends on serviceSlug; form seeding from
currentConfig moved to a separate effect keyed on JSON-stringified
config so a real change reseeds but referential churn doesn't.
2026-04-29 11:48:20 -04:00
bd7f2dfaed feat(ui): schema-driven ServiceConfigForm in Fleet & MazeNET inspectors
ServiceConfigForm.tsx fetches /topologies/services/{slug}/schema and renders
typed inputs (string/password/int/bool/textarea/enum) with reveal toggles for
secrets. SAVE persists via PUT (no restart); APPLY persists + force-recreates
the service container after a confirm dialog (matches the forwards_l3 pattern).

Mounts:
- DeckyFleet DeckyCard: clicking a service tag toggles the form below the
  EXPOSED row, gated on liveServicesEnabled (admin + non-swarm).
- MazeNET Inspector: renders the form above REMOVE SERVICE when a service
  is selected on a non-observed decky.

UI test plan is manual — no jsdom test infra in decnet_web yet.
2026-04-29 11:41:43 -04:00
75b1ce3a31 feat(api): per-service config schema endpoint + PUT/POST update+apply for fleet & topology
- GET /topologies/services/{name}/schema serves the declared ServiceConfigField
  metadata so the Inspector can auto-render forms.
- PUT  /(topologies/{id}/)deckies/{decky}/services/{svc}/config persists the
  validated dict (DB + compose); container untouched (Save).
- POST /(topologies/{id}/)deckies/{decky}/services/{svc}/apply persists then
  force-recreates <decky>-<svc> so the new env takes effect (Apply, destructive).
- New engine helper update_service_config wires both fleet and topology paths
  through the existing _persist_fleet_change / _rerender_topology_compose
  machinery; emits decky.<name>.service_config_changed on the bus.
2026-04-29 11:38:06 -04:00
54b1fbed14 feat(services): declarative config_schema on BaseService + SSH/HTTP/HTTPS descriptors
ServiceConfigField dataclass + BaseService.validate_cfg coerce/drop submitted
service_cfg dicts against per-service typed schemas. SSH/HTTP/HTTPS now declare
the keys they already read in compose_fragment, so the upcoming Inspector form
has metadata to render from instead of hardcoded inputs per service.
2026-04-29 11:28:53 -04:00
d314470d7f fix(stats): keep TopologyDecky.state in sync with docker so ACTIVE DECKIES counts right
Dashboard's ACTIVE DECKIES (active_deckies in get_stats_summary) counts
TopologyDecky rows where state='running'.  No code path was flipping
that state away from the default 'pending', so the count read 0/N
even when every container was running fine — the dashboard was lying.

Two complementary fixes:

1. deploy_topology — after the post-deploy compose ps verification,
   reconcile each TopologyDecky.state from the corresponding base
   container's docker state.  running → 'running'; anything else →
   'failed'.  Reuses the ps_rows already gathered for the
   ACTIVE-vs-DEGRADED status decision; no extra docker hit.

2. apply_add_decky — _materialise_decky_spawn now returns True/False;
   on True the row is updated to state='running' before
   _assert_valid_after.  Catches the case where a decky added via the
   live mutator queue stays at 'pending' indefinitely (the deployer's
   reconcile only runs on a fresh deploy_topology pass).

Existing topology deckies in active topologies will still read as
'pending' until the next deploy_topology runs, since this is
forward-only.  An operator-side fix is to teardown + redeploy or run
the (forthcoming) reconcile-on-startup pass.
2026-04-29 11:09:32 -04:00
57e527534c fix(mutator): auto-fall-back to legacy builder when buildx wedges live decky add
apply_add_decky's compose-up was hard-failing whenever the operator's
~/.docker/buildx/activity/ landed on a read-only mount — the wedge
detection in _compose_with_retry correctly refuses to retry (would
just leak more mounts), but for live materialisation we don't want a
wedged buildx state to abort an admin's mutation.  ANTI hit it on
adding decky-a977: 'failed to update builder last activity time: ...
read-only file system → buildx wedge detected → returned non-zero'.

_compose_up_with_buildkit_fallback wraps _compose_with_retry: on a
CalledProcessError whose stderr matches both wedge signatures
(_BUILDX_WEDGE_SIGNATURE + _BUILDX_EROFS_SIGNATURE), it logs a
warning with the manual recovery steps + retries once with
DOCKER_BUILDKIT=0 set.  The legacy non-buildx builder doesn't use
the activity dir and isn't affected.

Wired into the two paths that pass --build:
* _materialise_decky_spawn (apply_add_decky)
* _materialise_decky_services_diff (apply_update_decky service add)

_materialise_decky_recreate_base doesn't build — it just recreates a
container from an existing image — so it's not affected.

Operator-facing log message points at the manual fix
(rm -rf ~/.docker/buildx/activity + docker buildx create) so they
can recover at their leisure; we don't ATTEMPT the recovery because
the activity dir might be RO for a reason (zfs/btrfs snapshot, etc.)
that an automated rm would be wrong to fight.
2026-04-29 10:59:04 -04:00
892219ec87 feat(mutator): refuse forwards_l3 promotion on non-DMZ deckies
apply_update_decky's flip path now refuses to promote a decky to
gateway unless its home LAN is a DMZ.  The compose generator publishes
host ports for forwards_l3=True; a non-DMZ gateway would shadow the
host's port space without anything legitimately able to reach the
service.  Same posture as the existing 'forwards_l3 flip on live
requires force=true' guard — refused before any DB write so a bad
mutation leaves zero side-effects.

The check is intentionally NOT a standing _RULES invariant — the
codebase uses forwards_l3 for two semantics:

  1. Generic L3 forwarding (internal bridge deckies routing between
     their multi-home LANs).  The generator writes this on internal
     bridges via bridge_forward_probability; legitimately non-DMZ.
  2. DMZ gateway (host-port publisher).  Only meaningful on DMZ.

Standing validation can't enforce DMZ-homing without breaking case 1.
The guard fires only on the explicit user-driven flip path where the
operator's intent is unambiguously case 2.  Generator output and
internal-bridge attachments bypass the check.

check_gateway_homed_in_dmz lives in validate.py for callers that want
the explicit form (and for the test surface), but is not a standing
rule — comment in _RULES explains the asymmetry.
2026-04-29 00:38:51 -04:00
c002c5a4f1 feat(ui): forwards_l3 toggle in Inspector with destructive-recreate confirm
W5's apply_update_decky now accepts a forwards_l3 flip on a live
topology only when payload['force'] is true (the unforced flip raises
MutationError to keep half-thinking operators from killing
in-container state).  Until this commit there was no UI surface that
could even submit such a flip.

Inspector grows a 'PROMOTE TO GATEWAY' / 'DEMOTE GATEWAY' button when
a (non-observed) decky is selected.  The handler:

* On pending topologies → submits via editor.updateDecky immediately.
  No confirm dialog; no live containers to disturb.
* On active/degraded topologies → window.confirm() explaining the
  destructive base recreate ('In-container state is lost; active
  sessions to it drop'), then submits with extras.force=true.

useTopologyEditor.updateDecky grows an optional extras arg that
threads force: true into the queued mutation payload.  The pending
CRUD path ignores it (no force needed when no containers exist).

MazeNET.tsx wires a toggleGateway callback that handles the
optimistic local state update, surfaces an enqueue toast on the
active path, and lets the SSE forwarder reconcile when
mutation.applied lands.
2026-04-29 00:29:46 -04:00
a27e3f5e0f fix(tests+mutator): unbreak the docker-shadow test env + let mutator delete from active
Two related fixes that came out of running the W5 tests locally:

1. tests/__init__.py — empty file, makes 'tests/' a package so pytest
   stops inserting it into sys.path.  Without it, 'tests/docker/'
   (the docker-image test category) shadowed the installed docker SDK
   on every engine-touching test in the repo:

     module 'docker' has no attribute 'DockerClient'

   Pytest's default --import-mode=prepend was the culprit; making
   tests/ a package is the cheapest fix and doesn't change
   --import-mode for the whole tree.

2. delete_topology_decky / delete_topology_edge / delete_lan grow an
   'enforce_pending: bool = True' kwarg.  Default preserves the HTTP
   CRUD guard (api_decky_crud / api_edge_crud / api_lan_crud get the
   409 for free).  apply_remove_decky / apply_detach_decky /
   apply_remove_lan now pass enforce_pending=False — the mutator
   queue is the live-editing surface and has its own active-topology
   gating; the repo's pending-only guard was for design-time CRUD
   that mustn't bypass it.  Without this, apply_remove_decky was
   silently broken on active topologies pre-W5; W5's new test
   surfaced it on first run.

10/10 new W5 tests pass; 58/58 across mutator + topology suites.
2026-04-29 00:24:17 -04:00
98c929894c feat(mutator): selective materialisation for apply_update_decky + tests
apply_update_decky now discriminates three sub-cases:

* services list changed → diff old vs new and call
  _materialise_decky_services_diff (compose up -d for added,
  stop + rm -f for removed).  Mirrors services_live's pattern but
  doesn't import it — mutator-routed mutations carry a different bus
  surface (mutation.applied) than the direct API path
  (decky.<name>.service_added).
* forwards_l3 flipped → port publishing changes, which docker can
  only apply at container-create time.  Gated on payload['force'] is
  true; default raises MutationError so a half-thinking operator
  can't stomp a live decky.  When force=true,
  _materialise_decky_recreate_base does compose up -d --no-deps
  --force-recreate.  Pre-checked BEFORE the DB write so a refused
  mutation leaves zero side-effects.
* coord-only (x/y) → DB only, no docker work.

Ships tests/mutator/test_ops_materialisation.py with focused coverage
for every new helper: add_decky/remove_decky/attach_decky/
detach_decky/update_decky/update_lan paths against an active
topology, with compose primitives + docker SDK mocked at the source
modules so the helpers' lazy imports pick up the stubs.  Also covers
the pending-topology skip and the force-flag gating.
2026-04-29 00:18:20 -04:00
e3afec4e70 feat(mutator): live network.disconnect for apply_detach_decky
Symmetric to apply_attach_decky — after deleting the multi-home edge
from the DB, calls the docker SDK to drop the base container's
interface in the now-detached LAN.  Service containers lose
visibility automatically (they share the base's netns).

Idempotency: 'not connected' / 'no such' APIError is logged at info
and treated as success.
2026-04-29 00:15:39 -04:00
f347a3a736 feat(mutator): live network.connect for apply_attach_decky
After the DB writes that record the multi-home edge, calls the docker
SDK directly to add an interface to the base container's netns:

  client.networks.get(<topology bridge>).connect(<base>, ipv4_address=ip)

Non-destructive — the base keeps running, no recreate.  Service
containers automatically see the new interface because they share
the base's netns via network_mode: service:<base>.

Idempotency: docker APIError with 'already' / 'endpoint exists' is
logged at info and treated as success.  Other errors log + leave the
DB row in place; an operator retry will hit the same path.
2026-04-29 00:15:11 -04:00
eed55619cb feat(mutator): live teardown for apply_remove_decky
Captures the decky's name and services list before delete_topology_decky
runs (the helper needs both as compose targets even though the DB row
is gone), then calls _materialise_decky_remove which stops + rm -f's
the base + per-service containers via 'docker compose stop / rm -f'.

Re-renders the per-topology compose AFTER the stop/rm so a future
'compose up -d' on the file doesn't try to bring the decky back.
2026-04-29 00:14:44 -04:00
8c06190e69 feat(mutator): live spawn for apply_add_decky + shared materialisation helpers
Adds _materialise_decky_{spawn,remove,connect,disconnect,services_diff,recreate_base}
helpers alongside the existing _materialise_lan_change.  Each follows
the same skip rules: bail when topology is not active/degraded, when
agent-pinned, or when docker calls fail (logged, not re-raised — DB
remains source of truth).

apply_add_decky now calls _materialise_decky_spawn after the DB writes.
The helper:

* re-renders the per-topology compose so it lists the new decky;
* runs 'compose up -d --no-deps --build <decky_base> <decky>-<svc>...'
  in a worker thread (matches engine/services_live's pattern).

Service container targets are filtered through get_service() so
fleet_singleton services are skipped — they don't have per-decky
compose entries.  Gateway (forwards_l3=True) deckies need no
special-case here; the compose generator already emits the host
'ports:' block for them.

Subsequent commits wire the other apply_* ops to the matching
helpers.  Tests for the full set ship in the workstream's last
commit.
2026-04-29 00:14:18 -04:00
578cdf9e2e fix(mutator): reject hostile apply_update_lan changes on live topologies
subnet and is_dmz are pinned at deploy time — live deckies bind to
the bridge with IPs allocated from the old subnet, and is_dmz flips
the docker network's internal flag which can't be changed while
containers are attached.  Today the op happily wrote the new value
into the DB and left docker on the old one, drifting the two surfaces.

apply_update_lan now raises MutationError when topology status is
active or degraded and the patch touches subnet or is_dmz.  Coord
(x/y) and rename updates still pass through; renames don't currently
have a live caller and the bridge's docker name keys off the lan name
in the renderer, so the next deploy will reconcile.

This matches the posture taken by _materialise_lan_change for live
LAN add/remove (commit 472c84b).
2026-04-29 00:12:44 -04:00
2731b2608b fix(ui): keep multi-homed deckies in their home LAN on rehydrate
list_topology_edges has no ORDER BY, so SQL row order is undefined.
After apply_attach_decky added a bridge edge to a second LAN, on
refetch the bridge edge could come back first — firstLanFor then
picked it as the decky's home and the visualization 'teleported' the
decky into the other LAN (the bug ANTI saw immediately after
connecting two deckies across LANs).

Hydration now prefers the non-bridge edge (is_bridge=false) as home.
apply_add_decky writes is_bridge=false for the original edge;
apply_attach_decky writes is_bridge=true for subsequent multi-homing
edges.  Picking the non-bridge edge is stable across row reordering.

Two-pass implementation: pass 1 sets pinned homes (DMZ for gateways,
non-bridge for others); pass 2 fills any gap with the first edge
(legacy rows where is_bridge was never written).
2026-04-29 00:01:29 -04:00
472c84b9c8 fix(mutator): materialise live LAN add/remove on docker, not just the DB
apply_add_lan and apply_remove_lan were DB-only — they wrote/deleted
the topology_lans row but never created or destroyed the docker bridge
network.  Adding a LAN to a deployed topology silently did nothing on
the substrate side; any decky later attached to it had nowhere to bind.

Both ops now call a shared _materialise_lan_change helper after the DB
write.  When the topology is active/degraded and not pinned to a swarm
agent, the helper:

* creates / removes the docker bridge network (internal=True for
  non-DMZ LANs, mirroring engine/deployer.deploy_topology),
* re-renders the per-topology compose file so future redeploys reflect
  the change.

Failures are logged, not re-raised — the DB row stays as source of
truth so an operator can retry without leaking inconsistent state.
Agent-pinned topologies are skipped; the next agent push reconciles.

apply_add_decky / apply_attach_decky have the same gap and are not
fixed here — multi-homing a running container needs careful
recreate-vs-network-connect handling and is its own commit.  Without
those, dropping a decky into a freshly-added LAN still won't spawn a
container; only the LAN itself is now live.
2026-04-29 00:00:02 -04:00
bbed52a962 fix(bus): topic segments can't contain dots — service.added → service_added
Bus topic segments are NATS-style tokens and the validator at
bus/topics.py:402 rejects '.', '*', '>', whitespace.  My W3 constants
'service.added' / 'service.removed' tripped this on every live
add/remove call:

  ValueError: topic segment 'service.added' may not contain '.', ...

Renamed both to underscore form: DECKY_SERVICE_ADDED = 'service_added'.
Aligned the SSE forwarder's name mapping (decky.<name>.service_added →
SSE event 'decky.service_added') and the frontend's
useTopologyStream listener + MazeNET.tsx event handler.  Also updated
the wiki entry with a note about the underscore.
2026-04-28 23:53:25 -04:00
d595240f55 fix(engine): post-deploy verify topology containers, mark DEGRADED on boot crash
deploy_topology was flipping to ACTIVE the moment 'compose up -d'
returned 0, but compose returns 0 as soon as containers are *started*.
A service that crashes on boot (port bind failure, bad image, missing
entrypoint) left the topology row sitting at ACTIVE indefinitely while
half the substrate was dead.

After compose returns, we now run 'compose ps --all --format json',
parse the newline-delimited per-container rows, and downgrade to
DEGRADED with a reason listing the first eight unhealthy containers if
anything isn't in state='running'.  Operators see real state on the
topology page instead of an optimistic flag.

_compose_ps swallows compose-level errors (returns []) so an unrelated
docker hiccup doesn't gate the success path — the existing in-flight
exception path still catches genuine deploy failures with FAILED.
2026-04-28 23:39:50 -04:00
9e8d0b0464 fix(ui): route palette drops + design-time remove through live API on active topologies
When topoStatus is active/degraded, editor.updateDecky enqueues into
the mutator queue and returns {kind:'enqueued'}.  The palette-drop
handler then short-circuits on that and never updates local state, so
a service dragged onto a deployed decky just vanishes — what ANTI saw
as 'no way to APPLY'.

Same gap on the design-time 'REMOVE SERVICE' button in the Inspector's
service detail panel: enqueue + no local update = chip stays.

Both now route through liveAddService / liveRemoveService when the
topology is active, hitting POST/DELETE /topologies/{id}/deckies/{name}/services
directly and patching local state from the response.  Pending
topologies still queue through the mutator (correct: no live
containers to mutate).

Hoisted serviceRegistry / liveAddService / liveRemoveService above
the palette-drop callback so the deps array doesn't trip the const
TDZ at render time.
2026-04-28 23:38:37 -04:00
463877b8fc fix(ui): hit /topologies/ with trailing slash to keep bearer
FastAPI's redirect_slashes=True 307s /topologies → /topologies/, and
the browser drops Authorization on the redirected URL — the topology
picker in the canary create modal was landing as 401 even for admins.
Hit the canonical (trailing-slash) path so the request resolves on the
first hop.
2026-04-28 23:18:39 -04:00
0e5484648f feat: forward decky.*.service.* on per-topology SSE stream
The /topologies/{id}/events SSE proxy now subscribes to two bus
patterns concurrently and merges them through a bounded asyncio.Queue:

* topology.{id}.>  — lifecycle (status, mutation.*) — unchanged.
* decky.>          — per-decky events, filtered by payload.topology_id
                     so a fleet decky sharing a name with a topology
                     decky doesn't leak across.

_sse_name_for routes 'decky.<name>.service.added' to the SSE event
name 'decky.service.added' (kept the prefix so the frontend doesn't
collide with topology lifecycle events that share leaf names like
'status').

useTopologyStream surfaces the two new event names; MazeNET.tsx's
onStreamEvent optimistically patches the matching node's services
list so a second tab reflects shape changes without a refetch.
2026-04-28 23:15:38 -04:00
e7d49d7237 feat(ui): live service add/remove on fleet DeckyCard
DeckyCard grows the same per-chip × + dashed '+ ADD' affordances we
just shipped on the MazeNET Inspector.  Wired to POST/DELETE
/api/v1/deckies/{name}/services{,/svc}; the response's services list
flows back through onServicesChanged to update the parent's deckies
state without a refetch.

Gated on isAdmin && !decky.swarm — swarm deckies live on a remote
agent and the W3 endpoint runs docker compose locally, same gap as
the canary planter has for agent-pinned topologies.  Out of scope
here; flagged as a known limitation.

stopPropagation on the inline buttons + add-row container keeps the
card-level click (which selects the decky for inspection) from firing
on intra-row interactions.
2026-04-28 23:13:46 -04:00
1a631c9400 fix(ui): narrow services type for Inspector live-add picker
ObservedNode.services is the literal tuple ['*']; narrowing inside the
.filter() callback was tripping TS2345.  We already gate the live
controls on node.kind !== 'observed', so casting to readonly string[]
inside the filter is safe and keeps the discriminated union strict
elsewhere.
2026-04-28 23:11:39 -04:00
2fabcd1c29 feat(ui): live service add/remove on MazeNET Inspector
When the topology is active/degraded the Inspector switches services
chips into live controls: each chip gets a × button that DELETEs to
the W3 endpoint, and a dashed '+ ADD' chip opens a typeahead picker
fed by useServiceRegistry().perDecky.

Pending topologies still use the existing design-time path
(onRemoveService → editor.updateDecky); the Inspector picks based on
topologyStatus, so an operator never accidentally hits a live API
call against a topology that isn't deployed yet.

The mutation handlers in MazeNET.tsx hit POST/DELETE
/api/v1/topologies/{id}/deckies/{name}/services{,/svc} and
optimistically apply the response's services list to local state.
Cross-tab reconciliation rides on the SSE forwarder shipped in the
follow-up commit.
2026-04-28 23:11:02 -04:00
06f208c86e feat: surface fleet_singleton flag on /topologies/services
Adds a fleet_singletons array to ServiceCatalogResponse so per-decky
add UIs can filter out services like LLMNR that run once fleet-wide
(and would 422 server-side at the live add endpoint).

The existing 'services: list[str]' field is unchanged for back-compat
with MazeNET/useMazeApi.ts:257; the new field is additive.

decnet_web/src/hooks/useServiceRegistry.ts wraps the endpoint with a
module-scoped cache (registry only changes on BYOS install / plugin
drop, neither of which happens mid-session) and exposes a precomputed
.perDecky list so consumers don't need to re-derive the diff.
2026-04-28 23:08:29 -04:00
4287e94deb feat(ui): file drops tab on CanaryTokens
CanaryTokens.tsx grows a third tab — File drops — alongside Tokens
and Blobs.  The page now covers every 'admin landed bytes on a decky'
operation in one place.

FileDropModal mirrors the canary CreateModal's shape: Fleet/MazeNET
toggle, topology+decky picker, absolute-path validation matching the
backend (DeckyFileDropRequest rejects relative + ..-traversal), mode
+ mtime offset inputs, and a -1w preset for backdating.  FileReader →
data URL → strip prefix → POST /api/v1/deckies/files.

The list is local-only (localStorage, capped at 200 entries).  W2's
backend doesn't persist drops by design — the endpoint is for staging
payloads, not as an audit trail.  CLEAR LIST button on the tab; no
DELETE button on rows since the local entry doesn't track whether the
file is still there (an attacker may have moved it).

Alt+D shortcut joins Alt+C; alt-key only per the Linux-meta-key rule.
2026-04-28 23:06:53 -04:00
c942d4d333 feat(ui): scope canary tokens to MazeNET topology deckies
CanaryTokens.tsx grows a Fleet/MazeNET toggle in the create modal.  In
topology mode we hydrate /topologies?status=active for the topology
picker, then GET /topologies/{id} on selection to repopulate the decky
picker — topology deckies have a different shape than fleet's /deckies
endpoint.

The tokens table gains a SCOPE column (chip: 'fleet' / 'topology'),
and a third filter dropdown alongside state.  The drawer's metadata
section shows a Scope row with a clickable jump-link back to the
MazeNET view at the right topology.

CanaryTokenRow grows a topology_id field so the drawer/list can
discriminate without re-fetching.
2026-04-28 23:04:13 -04:00
6ac8cac908 feat(deckies): live service add/remove without full redeploy
decnet.engine.services_live exposes add_service / remove_service for
both fleet and topology decky scopes.  The host's _compose() wrapper
already supported per-service targeting (up --no-deps -d <svc>,
stop, rm -f); what was missing was the orchestration around it:

* add: validate against decnet.services.registry (rejects unknown +
  fleet_singleton); persist the new services list; re-render the
  per-scope compose file (so future redeploys reflect the change);
  run docker compose up -d --no-deps --build <decky>-<svc>.
* remove: stop + rm -f the service container; persist; re-render
  compose so a future up -d doesn't bring it back.

Both publish decky.<name>.service.added / .removed on the bus, with
the post-mutation services list.  Topic constants added to
decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md
ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored).

Four new admin endpoints:

* POST/DELETE /api/v1/deckies/{name}/services{,/svc}
* POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc}

ServiceMutationError messages are mapped at the API boundary to 404
(decky/topology missing), 409 (idempotency violation), 422 (unknown
or fleet_singleton service).
2026-04-28 22:51:42 -04:00
0bc4b05c73 feat(deckies): generic file drops on fleet + MazeNET deckies
Extracts the docker-exec-with-base64-stdin pattern out of canary/planter
and orchestrator/drivers/ssh into a shared decnet.decky_io package.
Both consumers now delegate; the canary planter test still proves the
contract end-to-end.

Adds POST/DELETE /api/v1/deckies/files for arbitrary file drops.
Container resolution is shared with the canary path: topology_id absent
means fleet (<name>-ssh), present routes through resolve_decky_container
which picks <name>-ssh when the topology decky exposes ssh, else the
topology base container decnet_t_<id8>_<name>.

Path validation rejects relative paths and '..' traversal at the request
model layer.  Bad base64 → 400; unknown topology → 404; decky not in
topology → 422; docker exec failure → 409.
2026-04-28 22:43:34 -04:00
3fe999d706 feat(canary): allow custom canaries on MazeNET deckies via API
POST /api/v1/canary/tokens grows an optional topology_id field.  When
present, the server hydrates the topology, validates the named decky is
in it, and resolves the docker container via
planter.resolve_topology_container — <name>-ssh if the decky exposes ssh,
else the topology base container.  Absent ⇒ fleet semantics, unchanged.

The token row gets a nullable topology_id column (no migration helper
per pre-v1 policy).  GET /api/v1/canary/tokens accepts ?topology_id= as
a filter.  DELETE re-resolves the container at revoke time so a
redeployed topology is still reachable.

422 when the named decky isn't in the topology; 404 when the topology
itself doesn't exist.
2026-04-28 22:34:45 -04:00
5802de1f86 feat(canary): seed baseline canaries on MazeNET deckies
Topology deploys now plant the configured canary baseline set on every
decky in the topology, mirroring the fleet-deploy hook. Containers are
resolved via resolve_topology_container — <decky>-ssh when the decky
exposes an ssh service, else the topology base container
decnet_t_<id8>_<decky>.

The planter's plant/revoke/seed_baseline grow an optional container=
kwarg; default preserves the fleet <name>-ssh resolution.
2026-04-28 22:30:11 -04:00
04b0637c24 feat(bounty): wire artifact download into BountyInspector drawer
The Vault page already shows file drops and stored mail (e3ddeb0) but
the inspector drawer had no download button — only the live-feed
ArtifactDrawer/MailDrawer offered raw byte retrieval. Add a DOWNLOAD
RAW action to BountyInspector that fires when bounty_type=artifact,
hitting /artifacts/{decky}/{stored_as}?service=<svc> with the bounty's
own service field (ssh or smtp). Mirrors ArtifactDrawer's blob handling
and 400/403/404 error mapping.

Also widen the icon/label vocabulary: artifact bounties get FileText
(file drops) or Mail (message_stored) instead of the generic Package,
and the inspector header chip mirrors the change.
2026-04-28 22:03:58 -04:00
e3ddeb0395 feat(bounty): surface file drops and stored mail in the Vault
The Bounty Vault page only read from the Bounty table, but
inotifywait-captured file drops (event_type=file_captured) and SMTP
quarantined messages (event_type=message_stored) were only landing in
the Logs table. AttackerDetail's tabs queried logs directly, so they
showed up per-attacker but were invisible on the global Vault page.

Mirror both events into Bounty as bounty_type=artifact with
payload.kind ∈ {file, mail} so the existing dedup
(bounty_type, attacker_ip, payload) collapses repeats by sha256. Add an
ARTIFACTS segment to the Vault filter row, plus dedicated render
branches: file drops show orig_path + size + writer attribution; mail
shows subject + From + attachment count + size, with the Mail icon
distinguishing them from FileText for file drops.

Forward-only — existing logs stay where they are. A backfill pass would
be straightforward (read Log WHERE event_type IN ('file_captured',
'message_stored') and feed each row through _extract_bounty) but is out
of scope here.
2026-04-28 19:42:54 -04:00
88f276e9e7 feat(collector): drop native unix daemon syslog from ingestion
sshd, pam_unix, sudo, CRON, systemd, kernel, rsyslogd, and dbus-daemon
all share the SSH/telnet decky containers and write to the same syslog
socket as DECNET's own emitters. Their output was being parsed and
ingested into the JSON stream, the dashboard, and the profiler — pure
noise: sshd's "Failed password for root from X" duplicates the
auth-helper's structured auth_attempt event, pam_unix repeats it again,
CRON/systemd say nothing about attacker behavior.

Drop these APP-NAMEs in _should_ingest before the JSON write and bus
publish. Raw .log file still captures everything for forensics. The
denylist is overridable with DECNET_COLLECTOR_DROP_APPS so operators
can extend it without code changes.
2026-04-28 19:21:39 -04:00
6055f9c837 fix(deckies): set MSGID=command on bash PROMPT_COMMAND syslog lines
Add --rfc5424 --msgid command to the logger invocation in SSH and telnet
decky bashrc. MSGID arrives as "command" instead of NIL, which is what
the profiler's _COMMAND_EVENT_TYPES filter expects. The parser heuristic
shipped in d4591b3 stays as a safety net for any future emitter that
forgets the flags or for inflight pre-rebuild containers.
2026-04-28 19:12:11 -04:00
d4591b38dc fix(profiler): aggregate bash PROMPT_COMMAND lines into attacker profile
SSH/telnet decky containers emit shell commands via `logger -t bash "CMD …"`
which produces RFC 5424 lines with MSGID=NIL. Both parsers were leaving
event_type="-", so the behavioral profiler's `_COMMAND_EVENT_TYPES` filter
silently dropped them — the IP profile existed but no command transcripts
or artifacts. Confirmed in the wild: 44/48 events from one attacker were
event_type="-".

Rewrite event_type to "command" in both parsers when MSGID=NIL and the
msg starts with "CMD ". Correlation parser also extracts the cmd= payload
into fields["command"] so the profiler can build the transcript; collector
parser leaves fields={} to avoid duplicate pills in the dashboard.
2026-04-28 19:09:41 -04:00
862e4dbb31 merge: testing → main (reconcile 2-week divergence) 2026-04-28 18:36:00 -04:00
DECNET CI
499836c9e4 chore: auto-release v0.2 [skip ci] 2026-04-13 11:50:02 +00:00
bb9c782c41 Merge pull request 'tofix/merge-testing-to-main' (#6) from tofix/merge-testing-to-main into main
Some checks failed
Release / Auto-tag release (push) Successful in 16s
Release / Build, scan & push conpot (push) Failing after 4m22s
Release / Build, scan & push elasticsearch (push) Failing after 4m37s
Release / Build, scan & push llmnr (push) Failing after 4m32s
Release / Build, scan & push mongodb (push) Failing after 4m35s
Release / Build, scan & push ldap (push) Failing after 4m44s
Release / Build, scan & push docker_api (push) Failing after 4m57s
Release / Build, scan & push imap (push) Failing after 4m50s
Release / Build, scan & push http (push) Failing after 4m59s
Release / Build, scan & push mssql (push) Failing after 4m28s
Release / Build, scan & push mqtt (push) Failing after 4m38s
Release / Build, scan & push ftp (push) Failing after 5m8s
Release / Build, scan & push k8s (push) Failing after 5m3s
Release / Build, scan & push mysql (push) Failing after 1m56s
Release / Build, scan & push redis (push) Has started running
Release / Build, scan & push rdp (push) Has been cancelled
Release / Build, scan & push pop3 (push) Has been cancelled
Release / Build, scan & push postgres (push) Has been cancelled
Release / Build, scan & push sip (push) Has started running
Release / Build, scan & push smb (push) Has started running
Release / Build, scan & push smtp (push) Has started running
Release / Build, scan & push snmp (push) Has started running
Release / Build, scan & push ssh (push) Has started running
Release / Build, scan & push telnet (push) Has started running
Release / Build, scan & push tftp (push) Has started running
Release / Build, scan & push vnc (push) Has started running
Reviewed-on: #6
2026-04-13 13:49:47 +02:00
597854cc06 Merge branch 'merge/testing-to-main' into tofix/merge-testing-to-main
Some checks failed
PR Gate / Lint (ruff) (pull_request) Successful in 17s
PR Gate / SAST (bandit) (pull_request) Successful in 23s
PR Gate / Dependency audit (pip-audit) (pull_request) Successful in 36s
PR Gate / Test (pytest) (3.12) (pull_request) Failing after 1m0s
PR Gate / Test (pytest) (3.11) (pull_request) Failing after 1m10s
2026-04-13 07:48:43 -04:00
3b4b0a1016 merge: resolve conflicts between testing and main (remove tracked settings, fix pyproject deps) 2026-04-13 07:48:37 -04:00
DECNET CI
8ad3350d51 ci: auto-merge dev → testing [skip ci] 2026-04-13 05:55:46 +00:00
23ec470988 Merge pull request 'fix/merge-testing-to-main' (#4) from fix/merge-testing-to-main into main
Some checks failed
Release / Auto-tag release (push) Failing after 8s
Release / Build, scan & push cowrie (push) Has been skipped
Release / Build, scan & push docker_api (push) Has been skipped
Release / Build, scan & push elasticsearch (push) Has been skipped
Release / Build, scan & push ftp (push) Has been skipped
Release / Build, scan & push http (push) Has been skipped
Release / Build, scan & push imap (push) Has been skipped
Release / Build, scan & push k8s (push) Has been skipped
Release / Build, scan & push ldap (push) Has been skipped
Release / Build, scan & push llmnr (push) Has been skipped
Release / Build, scan & push mongodb (push) Has been skipped
Release / Build, scan & push mqtt (push) Has been skipped
Release / Build, scan & push mssql (push) Has been skipped
Release / Build, scan & push mysql (push) Has been skipped
Release / Build, scan & push pop3 (push) Has been skipped
Release / Build, scan & push postgres (push) Has been skipped
Release / Build, scan & push rdp (push) Has been skipped
Release / Build, scan & push real_ssh (push) Has been skipped
Release / Build, scan & push redis (push) Has been skipped
Release / Build, scan & push sip (push) Has been skipped
Release / Build, scan & push smb (push) Has been skipped
Release / Build, scan & push smtp (push) Has been skipped
Release / Build, scan & push snmp (push) Has been skipped
Release / Build, scan & push tftp (push) Has been skipped
Release / Build, scan & push vnc (push) Has been skipped
Reviewed-on: #4
2026-04-12 10:10:19 +02:00
4064e19af1 merge: resolve conflicts between testing and main
Some checks failed
PR Gate / Lint (ruff) (pull_request) Failing after 11s
PR Gate / Test (pytest) (3.11) (pull_request) Failing after 10s
PR Gate / Test (pytest) (3.12) (pull_request) Failing after 10s
PR Gate / SAST (bandit) (pull_request) Successful in 12s
PR Gate / Dependency audit (pip-audit) (pull_request) Failing after 13s
2026-04-12 04:09:17 -04:00
DECNET CI
ac4e5e1570 ci: auto-merge dev → testing
All checks were successful
CI / Lint (ruff) (push) Successful in 11s
CI / Test (pytest) (3.11) (push) Successful in 1m9s
CI / Test (pytest) (3.12) (push) Successful in 1m14s
CI / SAST (bandit) (push) Successful in 12s
CI / Dependency audit (pip-audit) (push) Successful in 21s
CI / Merge dev → testing (push) Has been skipped
CI / Open PR to main (push) Successful in 6s
PR Gate / Lint (ruff) (pull_request) Successful in 11s
PR Gate / Test (pytest) (3.11) (pull_request) Successful in 1m13s
PR Gate / Test (pytest) (3.12) (pull_request) Successful in 1m12s
PR Gate / SAST (bandit) (pull_request) Successful in 13s
PR Gate / Dependency audit (pip-audit) (pull_request) Successful in 21s
2026-04-12 07:53:07 +00:00
eb40be2161 chore: split dev and normal dependencies in pyproject.toml 2026-04-08 00:09:15 -04:00
0927d9e1e8 Modified: DEVELOPMENT.md 2026-04-06 12:03:36 -04:00
9c81fb4739 revert f64c251a9e
revert revert f8a9f8fc64

revert Added: modified notes. Finished CI/CD pipeline.
2026-04-06 18:02:28 +02:00
e4171789a8 Added: documentation about the deaddeck archetype and how to run it. 2026-04-06 11:51:24 -04:00
f64c251a9e revert f8a9f8fc64
revert Added: modified notes. Finished CI/CD pipeline.
2026-04-06 17:15:32 +02:00
c56c9fe667 Merge pull request 'Auto PR: dev → main' (#2) from dev into main
Some checks failed
Release / Auto-tag release (push) Successful in 14s
Release / Build, scan & push cowrie (push) Failing after 41s
Release / Build, scan & push docker_api (push) Failing after 30s
Release / Build, scan & push elasticsearch (push) Failing after 30s
Release / Build, scan & push ftp (push) Failing after 32s
Release / Build, scan & push http (push) Failing after 32s
Release / Build, scan & push imap (push) Failing after 31s
Release / Build, scan & push k8s (push) Failing after 32s
Release / Build, scan & push ldap (push) Failing after 30s
Release / Build, scan & push llmnr (push) Failing after 33s
Release / Build, scan & push mongodb (push) Failing after 32s
Release / Build, scan & push mqtt (push) Failing after 33s
Release / Build, scan & push mssql (push) Failing after 31s
Release / Build, scan & push mysql (push) Failing after 33s
Release / Build, scan & push pop3 (push) Failing after 33s
Release / Build, scan & push postgres (push) Failing after 32s
Release / Build, scan & push rdp (push) Failing after 32s
Release / Build, scan & push real_ssh (push) Failing after 33s
Release / Build, scan & push redis (push) Failing after 33s
Release / Build, scan & push sip (push) Failing after 33s
Release / Build, scan & push smb (push) Failing after 31s
Release / Build, scan & push smtp (push) Failing after 31s
Release / Build, scan & push snmp (push) Failing after 31s
Release / Build, scan & push tftp (push) Failing after 31s
Release / Build, scan & push vnc (push) Failing after 33s
Reviewed-on: #2
2026-04-06 17:11:54 +02:00
897f498bcd Merge dev into main: resolve conflicts, keep tests out of main
Some checks failed
Release / Auto-tag release (push) Successful in 14s
Release / Build, scan & push cowrie (push) Failing after 6m9s
Release / Build, scan & push docker_api (push) Failing after 31s
Release / Build, scan & push elasticsearch (push) Failing after 30s
Release / Build, scan & push ftp (push) Failing after 30s
Release / Build, scan & push http (push) Failing after 33s
Release / Build, scan & push imap (push) Failing after 30s
Release / Build, scan & push k8s (push) Failing after 30s
Release / Build, scan & push ldap (push) Failing after 33s
Release / Build, scan & push llmnr (push) Failing after 29s
Release / Build, scan & push mongodb (push) Failing after 30s
Release / Build, scan & push mqtt (push) Failing after 30s
Release / Build, scan & push mssql (push) Failing after 30s
Release / Build, scan & push mysql (push) Failing after 30s
Release / Build, scan & push pop3 (push) Failing after 32s
Release / Build, scan & push postgres (push) Failing after 29s
Release / Build, scan & push rdp (push) Failing after 29s
Release / Build, scan & push real_ssh (push) Failing after 31s
Release / Build, scan & push redis (push) Failing after 29s
Release / Build, scan & push sip (push) Failing after 30s
Release / Build, scan & push smb (push) Failing after 32s
Release / Build, scan & push smtp (push) Failing after 31s
Release / Build, scan & push snmp (push) Failing after 29s
Release / Build, scan & push tftp (push) Failing after 29s
Release / Build, scan & push vnc (push) Failing after 30s
2026-04-04 18:00:17 -03:00
92e06cb193 Add release workflow for auto-tagging and Docker image builds
Some checks failed
Release / Auto-tag release (push) Failing after 3s
Release / Build & push cowrie (push) Has been skipped
Release / Build & push docker_api (push) Has been skipped
Release / Build & push elasticsearch (push) Has been skipped
Release / Build & push ftp (push) Has been skipped
Release / Build & push http (push) Has been skipped
Release / Build & push imap (push) Has been skipped
Release / Build & push k8s (push) Has been skipped
Release / Build & push ldap (push) Has been skipped
Release / Build & push llmnr (push) Has been skipped
Release / Build & push mongodb (push) Has been skipped
Release / Build & push mqtt (push) Has been skipped
Release / Build & push mssql (push) Has been skipped
Release / Build & push mysql (push) Has been skipped
Release / Build & push pop3 (push) Has been skipped
Release / Build & push postgres (push) Has been skipped
Release / Build & push rdp (push) Has been skipped
Release / Build & push real_ssh (push) Has been skipped
Release / Build & push redis (push) Has been skipped
Release / Build & push sip (push) Has been skipped
Release / Build & push smb (push) Has been skipped
Release / Build & push smtp (push) Has been skipped
Release / Build & push snmp (push) Has been skipped
Release / Build & push tftp (push) Has been skipped
Release / Build & push vnc (push) Has been skipped
2026-04-04 17:16:53 -03:00
7ad7e1e53b main: remove tests and pytest dependency 2026-04-04 16:28:33 -03:00
1719 changed files with 107160 additions and 10409 deletions

20
.gitignore vendored
View File

@@ -1,5 +1,6 @@
.venv/
.venv*/
docker-compose.yaml
.311/
.3[0-9][0-9]/
logs/
@@ -51,3 +52,22 @@ schem
# pydeps-style dependency graph dumps from local analysis runs.
deps.txt
# Node modules vendored under decnet/canary/ for the obfuscator helper.
# The package.json is the source of truth; modules are reinstalled at
# build/deploy time.
node_modules/
package-lock.json
# TTP rule-precision corpus pulled from prod sqlite. Real attacker
# payloads — operator-only artifact. The synthetic ``seed_*.jsonl``
# files alongside ARE committed and exercise the harness in CI.
tests/ttp/rule_precision/corpus/*.jsonl
tests/ttp/rule_precision/corpus/seed_*.jsonl
threatfox-api.json
# MITRE ATT&CK STIX bundle — 50 MB, fetched at runtime via attack_stix.py
enterprise-attack-*.json
# pytest failure dump files
testfail

17
COPYRIGHT Normal file
View File

@@ -0,0 +1,17 @@
DECNET - Deception Network
Copyright (C) 2026 Samuel Paschuan <samsam70000@gmail.com>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public
License along with this program. If not, see <https://www.gnu.org/licenses/>.
SPDX-License-Identifier: AGPL-3.0-or-later

141
LICENSE
View File

@@ -1,5 +1,5 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
@@ -7,17 +7,15 @@
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The GNU Affero General Public License is a free, copyleft license for
software and other kinds of works, specifically designed to ensure
cooperation with the community in the case of network server software.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
our General Public Licenses are intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
software for all its users.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
@@ -26,44 +24,34 @@ them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
Developers that use our General Public Licenses protect your rights
with two steps: (1) assert copyright on the software, and (2) offer
you this License which gives you legal permission to copy, distribute
and/or modify the software.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
A secondary benefit of defending all users' freedom is that
improvements made in alternate versions of the program, if they
receive widespread use, become available for other developers to
incorporate. Many developers of free software are heartened and
encouraged by the resulting cooperation. However, in the case of
software used on network servers, this result may fail to come about.
The GNU General Public License permits making a modified version and
letting the public access it on a server without ever releasing its
source code to the public.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
The GNU Affero General Public License is designed specifically to
ensure that, in such cases, the modified source code becomes available
to the community. It requires the operator of a network server to
provide the source code of the modified version running there to the
users of that server. Therefore, public use of a modified version, on
a publicly accessible server, gives the public access to the source
code of the modified version.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
An older license, called the Affero General Public License and
published by Affero, was designed to accomplish similar goals. This is
a different license, not a version of the Affero GPL, but Affero has
released a new version of the Affero GPL which permits relicensing under
this license.
The precise terms and conditions for copying, distribution and
modification follow.
@@ -72,7 +60,7 @@ modification follow.
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"This License" refers to version 3 of the GNU Affero General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
@@ -549,35 +537,45 @@ to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
13. Remote Network Interaction; Use with the GNU General Public License.
Notwithstanding any other provision of this License, if you modify the
Program, your modified version must prominently offer all users
interacting with it remotely through a computer network (if your version
supports such interaction) an opportunity to receive the Corresponding
Source of your version by providing access to the Corresponding Source
from a network server at no charge, through some standard or customary
means of facilitating copying of software. This Corresponding Source
shall include the Corresponding Source for any work covered by version 3
of the GNU General Public License that is incorporated pursuant to the
following paragraph.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
under version 3 of the GNU General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
but the work with which it is combined will remain governed by version
3 of the GNU General Public License.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
the GNU Affero General Public License from time to time. Such new versions
will be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Program specifies that a certain numbered version of the GNU Affero General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
GNU Affero General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
versions of the GNU Affero General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
@@ -635,40 +633,29 @@ the "copyright" line and a pointer to where the full notice is found.
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
GNU Affero General Public License for more details.
You should have received a copy of the GNU General Public License
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
<program> Copyright (C) <year> <name of author>
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".
If your software can interact with users remotely through a computer
network, you should also make sure that it provides a way for users to
get its source. For example, if your program is a web application, its
interface could display a "Source" link that leads users to an archive
of the code. There are many ways you could offer source, and different
solutions will be better for different programs; see section 13 for the
specific requirements.
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
For more information on this, and how to apply and follow the GNU AGPL, see
<https://www.gnu.org/licenses/>.
The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<https://www.gnu.org/licenses/why-not-lgpl.html>.

261
Makefile Normal file
View File

@@ -0,0 +1,261 @@
PYTEST := .311/bin/pytest
FAIL_FAST ?= 1
NO_CACHE ?= 0
ARGS :=
# addopts in pyproject.toml already provides -v -q -x -n 4 --dist load.
# Unit suites inherit that; special suites clear it with --override-ini.
UNIT_FLAGS := --timeout=30 --timeout-method=thread
SEQ_FLAGS := --override-ini="addopts=-v -x" -n logical --timeout=120 --timeout-method=thread
FUZZ_FLAGS := --override-ini="addopts=-v -x" -n logical -m fuzz \
--ignore=tests/api/test_schemathesis.py \
--ignore=tests/api/test_schemathesis_agent.py \
--ignore=tests/api/test_schemathesis_swarm.py \
--ignore=tests/api/test_schemathesis_ttp.py
SCHEMA_QUICK ?= 0
SCHEMA_FLAGS := --override-ini="addopts=-v -x" -n 4 -m fuzz --timeout=600 --timeout-method=thread
BENCH_FLAGS := --override-ini="addopts=-v" -p no:xdist --benchmark-only -m bench
# ── Unit suites (xdist, 30s timeout) ─────────────────────────────────────────
.PHONY: test-core
test-core:
$(PYTEST) tests/core tests/config tests/factories tests/fixtures $(UNIT_FLAGS) $(ARGS)
.PHONY: test-web
test-web:
$(PYTEST) tests/web tests/services $(UNIT_FLAGS) $(ARGS)
.PHONY: test-db
test-db:
$(PYTEST) tests/db tests/vectorstore $(UNIT_FLAGS) $(ARGS)
.PHONY: test-bus
test-bus:
$(PYTEST) tests/bus tests/logging tests/telemetry $(UNIT_FLAGS) $(ARGS)
.PHONY: test-ttp
test-ttp:
$(PYTEST) tests/ttp $(UNIT_FLAGS) $(ARGS)
.PHONY: test-intel
test-intel:
$(PYTEST) tests/intel tests/asn tests/geoip $(UNIT_FLAGS) $(ARGS)
.PHONY: test-analysis
test-analysis:
$(PYTEST) tests/clustering tests/correlation $(UNIT_FLAGS) $(ARGS)
.PHONY: test-infra
test-infra:
$(PYTEST) tests/agent tests/collector tests/sniffer tests/profiler $(UNIT_FLAGS) $(ARGS)
.PHONY: test-fleet
test-fleet:
$(PYTEST) tests/fleet tests/swarm tests/topology tests/orchestrator tests/deploy tests/updater $(UNIT_FLAGS) $(ARGS)
.PHONY: test-cli
test-cli:
$(PYTEST) tests/cli tests/engine tests/mutator tests/realism $(UNIT_FLAGS) $(ARGS)
.PHONY: test-features
test-features:
$(PYTEST) tests/canary tests/artifacts tests/webhook tests/decky_io tests/prober $(UNIT_FLAGS) $(ARGS)
# ── Go and React suites ───────────────────────────────────────────────────────
_GO_MODULES := \
decnet/templates/_caddy_modules/decnetfp \
decnet/templates/http/_caddy_modules/decnetfp \
decnet/templates/https/_caddy_modules/decnetfp
.PHONY: test-go
test-go:
@failed=""; \
for mod in $(_GO_MODULES); do \
echo "=== go test: $$mod ==="; \
if (cd "$$mod" && go test ./...); then \
echo "[PASS] $$mod"; \
else \
echo "[FAIL] $$mod"; \
failed="$$failed $$mod"; \
if [ "$(FAIL_FAST)" = "1" ]; then exit 1; fi; \
fi; \
done; \
[ -z "$$failed" ]
.PHONY: test-react
test-react:
cd decnet_web && npm run test:run $(ARGS)
# ── Special suites (sequential, longer timeout) ───────────────────────────────
.PHONY: test-live
test-live:
$(PYTEST) tests/live -m live $(SEQ_FLAGS) $(ARGS)
.PHONY: test-api
test-api:
$(PYTEST) tests/api $(SEQ_FLAGS) $(ARGS)
.PHONY: test-stress
test-stress:
$(PYTEST) tests/stress -m stress $(SEQ_FLAGS) $(ARGS)
.PHONY: test-service
test-service:
$(PYTEST) tests/service_testing $(SEQ_FLAGS) $(ARGS)
.PHONY: test-fuzz
test-fuzz:
$(PYTEST) $(FUZZ_FLAGS) $(ARGS)
.PHONY: test-schema
test-schema:
SCHEMA_QUICK=$(SCHEMA_QUICK) $(PYTEST) \
tests/api/test_schemathesis.py \
tests/api/test_schemathesis_agent.py \
tests/api/test_schemathesis_swarm.py \
tests/api/test_schemathesis_ttp.py \
$(SCHEMA_FLAGS) $(ARGS)
.PHONY: test-bench
test-bench:
$(PYTEST) tests/perf $(BENCH_FLAGS) $(ARGS)
.PHONY: test-docker
test-docker:
DECNET_LIVE_DOCKER=1 $(PYTEST) tests/docker -m docker $(SEQ_FLAGS) $(ARGS)
# ── Static analysis ───────────────────────────────────────────────────────────
.PHONY: test-mypy
test-mypy:
.311/bin/mypy decnet --ignore-missing-imports --no-error-summary
.PHONY: test-bandit
test-bandit:
.311/bin/bandit -r decnet -c pyproject.toml
.PHONY: test-vulture
test-vulture:
.311/bin/vulture decnet --min-confidence 80
.PHONY: test-pip-audit
test-pip-audit:
.311/bin/pip-audit
# ── Composite: all suites ─────────────────────────────────────────────────────
_ALL_SUITES := core web db bus ttp intel analysis infra fleet cli features \
go react \
live api schema stress service fuzz bench docker \
mypy bandit vulture pip-audit
.PHONY: test-all test
test-all test:
@failed=""; \
for suite in $(_ALL_SUITES); do \
echo ""; \
echo "══════════════════════════ $$suite ══════════════════════════"; \
if $(MAKE) --no-print-directory test-$$suite ARGS="$(ARGS)"; then \
echo "[PASS] $$suite"; \
else \
echo "[FAIL] $$suite"; \
failed="$$failed $$suite"; \
if [ "$(FAIL_FAST)" = "1" ]; then \
echo "Stopping at first failure. Use FAIL_FAST=0 to run all suites."; \
exit 1; \
fi; \
fi; \
done; \
if [ -n "$$failed" ]; then \
echo ""; \
echo "Failed:$$failed"; \
exit 1; \
fi; \
echo ""; \
echo "All suites passed."
# ── Decky image pre-build ─────────────────────────────────────────────────────
_DECKY_TEMPLATES := \
conpot docker_api elasticsearch ftp http https imap k8s ldap \
llmnr mongodb mqtt mssql mysql pop3 postgres rdp redis sip smb smtp \
sniffer snmp ssh telnet tftp vnc
.PHONY: build-all
build-all:
@failed=""; \
for svc in $(_DECKY_TEMPLATES); do \
echo ""; \
echo "══════════════════════════ $$svc ══════════════════════════"; \
_nc=""; \
if [ "$(NO_CACHE)" = "1" ]; then _nc="--no-cache"; fi; \
if DOCKER_BUILDKIT=1 docker build $$_nc \
-t decnet/$$svc:latest \
decnet/templates/$$svc; then \
echo "[BUILT] $$svc"; \
else \
echo "[FAIL] $$svc"; \
failed="$$failed $$svc"; \
if [ "$(FAIL_FAST)" = "1" ]; then \
echo "Stopping at first failure. Use FAIL_FAST=0 to build all."; \
exit 1; \
fi; \
fi; \
done; \
if [ -n "$$failed" ]; then \
echo ""; \
echo "Failed:$$failed"; \
exit 1; \
fi; \
echo ""; \
echo "All decky images built."
.PHONY: help
help:
@echo "Unit suites (xdist, 30s timeout):"
@echo " make test-core tests/core + config + factories + fixtures"
@echo " make test-web tests/web + services"
@echo " make test-db tests/db + vectorstore"
@echo " make test-bus tests/bus + logging + telemetry"
@echo " make test-ttp tests/ttp"
@echo " make test-intel tests/intel + asn + geoip"
@echo " make test-analysis tests/clustering + correlation"
@echo " make test-infra tests/agent + collector + sniffer + profiler"
@echo " make test-fleet tests/fleet + swarm + topology + orchestrator + deploy + updater"
@echo " make test-cli tests/cli + engine + mutator + realism"
@echo " make test-features tests/canary + artifacts + webhook + decky_io + prober"
@echo ""
@echo "Go / React suites:"
@echo " make test-go go test ./... in each Caddy module variant"
@echo " make test-react vitest run in decnet_web"
@echo ""
@echo "Special suites (sequential, 120s timeout):"
@echo " make test-live tests/live"
@echo " make test-api tests/api (schemathesis)"
@echo " make test-stress tests/stress"
@echo " make test-service tests/service_testing"
@echo " make test-schema schemathesis contract tests (-m fuzz, xdist logical)"
@echo " make test-schema SCHEMA_QUICK=1 same, capped at 100 examples per test"
@echo " make test-fuzz hypothesis fuzz (all normal dirs, -m fuzz, skips schemathesis files)"
@echo " make test-bench tests/perf"
@echo " make test-docker tests/docker (needs DECNET_LIVE_DOCKER=1)"
@echo ""
@echo "Static analysis:"
@echo " make test-mypy mypy type check on decnet/"
@echo " make test-bandit bandit security scan on decnet/"
@echo " make test-vulture vulture dead code scan (>=80% confidence)"
@echo " make test-pip-audit pip-audit dependency vulnerability scan"
@echo ""
@echo "Composites:"
@echo " make test-all ALL suites (unit + go + react + live + api + schema + fuzz + bench + stress + docker + static analysis)"
@echo " make test-all FAIL_FAST=0 same, report all failures instead of stopping"
@echo ""
@echo "Passthrough: make test-web ARGS='--lf -s'"
@echo ""
@echo "Decky images:"
@echo " make build-all build decnet/<svc>:latest for all 27 decky templates"
@echo " make build-all NO_CACHE=1 same, bypassing Docker layer cache"
@echo " make build-all FAIL_FAST=0 same, continue past failures"

861
README.md

File diff suppressed because it is too large Load Diff

0
bait/.gitkeep Normal file
View File

5
bait/README.md Normal file
View File

@@ -0,0 +1,5 @@
# bait/
Default operator-supplied email seed for IMAP/POP3 deckies. Drop `*.eml` and/or `*.json` files here; the IMAP/POP3 services bind-mount this dir read-only at `/var/spool/decnet-emails/seed` when no per-decky `email_seed` is configured. Entries concatenate onto the hardcoded bait baseline (additive to realism-engine output, never replacing).
JSON shape: list of dicts with required `from_addr`, `to_addr`, `subject`, `body`; optional `from_name`, `date`, `flags`. See `decnet/templates/imap/server.py` for the loader.

BIN
decnet.tar Normal file

Binary file not shown.

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""DECNET — honeypot deception-network framework.
This __init__ runs once, on the first `import decnet.*`. It seeds

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""DECNET worker agent — runs on every SWARM worker host.
Exposes an mTLS-protected FastAPI service the master's SWARM controller

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Worker-side FastAPI app.
Protected by mTLS at the ASGI/uvicorn transport layer: uvicorn is started
@@ -25,6 +26,7 @@ from contextlib import asynccontextmanager
from typing import Any, Optional
from fastapi import FastAPI, HTTPException
from fastapi.responses import JSONResponse
from pydantic import BaseModel, Field
import contextlib
@@ -181,6 +183,7 @@ class TeardownRequest(BaseModel):
class MutateRequest(BaseModel):
decky_id: str
services: list[str]
dry_run: bool = False
# ------------------------------------------------------------------ routes
@@ -197,15 +200,22 @@ async def status() -> dict:
@app.post(
"/deploy",
responses={500: {"description": "Deployer raised an exception materialising the config"}},
status_code=202,
responses={202: {"description": "Deploy accepted; runs in background; lifecycle deltas pushed via heartbeat"}},
)
async def deploy(req: DeployRequest) -> dict:
try:
await _exec.deploy(req.config, dry_run=req.dry_run, no_cache=req.no_cache)
except Exception as exc:
log.exception("agent.deploy failed")
raise HTTPException(status_code=500, detail=str(exc)) from exc
return {"status": "deployed", "deckies": len(req.config.deckies)}
"""Spawn the deploy in the background and return 202 immediately.
The master tracks per-decky completion via lifecycle deltas pushed on
the next heartbeat (one immediate push on completion, plus the
scheduled 30 s ticks as a fallback). Holding the request open across
a multi-minute compose build was the previous source of the wizard
API-hang."""
asyncio.create_task(
_exec.deploy_async(req.config, dry_run=req.dry_run, no_cache=req.no_cache),
name=f"deploy-{id(req)}",
)
return {"status": "accepted", "deckies": [d.name for d in req.config.deckies]}
@app.post(
@@ -307,14 +317,50 @@ async def topology_state() -> dict:
@app.post(
"/mutate",
responses={501: {"description": "Worker-side mutate not yet implemented"}},
status_code=202,
responses={
202: {"description": "Mutate accepted; runs in background; lifecycle delta pushed via heartbeat"},
404: {"description": "No active deployment, or unknown decky_id (dry_run validation only)"},
},
)
async def mutate(req: MutateRequest) -> dict:
# TODO: implement worker-side mutate. Currently the master performs
# mutation by re-sending a full /deploy with the updated DecnetConfig;
# this avoids duplicating mutation logic on the worker for v1. When
# ready, replace the 501 with a real redeploy-of-a-single-decky path.
raise HTTPException(
status_code=501,
detail="Per-decky mutate is performed via /deploy with updated services",
async def mutate(req: MutateRequest) -> Any:
"""Spawn the mutate in the background and return 202 immediately.
Master tracks completion via a lifecycle delta pushed on the next
heartbeat (immediate push on completion). ``dry_run`` is still
synchronous — it validates against the worker's current state and
returns the would-be services without spawning a task or touching
docker, so the wizard's preview path stays cheap."""
if req.dry_run:
from decnet.config import load_state
state = load_state()
if state is None:
raise HTTPException(
status_code=404,
detail="no active deployment on this worker",
)
cfg, _ = state
decky = next((d for d in cfg.deckies if d.name == req.decky_id), None)
if decky is None:
raise HTTPException(
status_code=404,
detail=f"decky {req.decky_id!r} not found in worker state",
)
return JSONResponse(
status_code=200,
content={
"status": "dry_run",
"decky_id": req.decky_id,
"services": list(req.services),
},
)
asyncio.create_task(
_exec.mutate_async(req.decky_id, list(req.services)),
name=f"mutate-{req.decky_id}",
)
return {
"status": "accepted",
"decky_id": req.decky_id,
"services": list(req.services),
}

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Thin adapter between the agent's HTTP endpoints and the existing
``decnet.engine.deployer`` code path.
@@ -80,6 +81,99 @@ async def deploy(config: DecnetConfig, dry_run: bool = False, no_cache: bool = F
await asyncio.to_thread(_deployer.deploy, config, dry_run, no_cache, False)
async def deploy_async(
config: DecnetConfig, *, dry_run: bool = False, no_cache: bool = False,
) -> None:
"""Background-task body for /deploy: run the deploy, then push a
lifecycle delta to the master so it observes terminal transitions
immediately rather than waiting for the next scheduled heartbeat.
Per-decky lifecycle deltas — master pivots them onto the matching
open DeckyLifecycle rows via the heartbeat handler. Errors are
captured and pushed as ``failed`` deltas; the task itself never
raises (a crashed task would just leave master rows wedged).
"""
from datetime import datetime, timezone
from decnet.agent.heartbeat import push_lifecycle_delta
decky_names = [d.name for d in config.deckies]
try:
await deploy(config, dry_run=dry_run, no_cache=no_cache)
except Exception as exc: # noqa: BLE001
log.exception("agent.deploy_async failed")
err = f"{type(exc).__name__}: {exc}"
deltas = [
{
"decky_name": name, "operation": "deploy",
"status": "failed", "error": err[:2000],
"completed_at": datetime.now(timezone.utc).isoformat(),
}
for name in decky_names
]
await push_lifecycle_delta(deltas)
return
deltas = [
{
"decky_name": name, "operation": "deploy",
"status": "succeeded",
"completed_at": datetime.now(timezone.utc).isoformat(),
}
for name in decky_names
]
await push_lifecycle_delta(deltas)
async def mutate_async(decky_id: str, services: list[str]) -> None:
"""Background-task body for /mutate. Same shape as deploy_async:
perform the work, then push a single lifecycle delta on
completion (success or failure)."""
import time
from datetime import datetime, timezone
from decnet.composer import write_compose
from decnet.config import load_state, save_state
from decnet.engine import _compose_with_retry
from decnet.agent.heartbeat import push_lifecycle_delta
def _delta(status: str, error: str | None = None) -> dict:
out = {
"decky_name": decky_id, "operation": "mutate",
"status": status,
"completed_at": datetime.now(timezone.utc).isoformat(),
}
if error is not None:
out["error"] = error[:2000]
return out
try:
state = load_state()
if state is None:
await push_lifecycle_delta(
[_delta("failed", "no active deployment on this worker")],
)
return
cfg, compose_path = state
decky = next((d for d in cfg.deckies if d.name == decky_id), None)
if decky is None:
await push_lifecycle_delta(
[_delta("failed", f"decky {decky_id!r} not found in worker state")],
)
return
decky.services = list(services)
decky.last_mutated = time.time()
save_state(cfg, compose_path)
write_compose(cfg, compose_path)
await asyncio.to_thread(
_compose_with_retry, "up", "-d", "--remove-orphans",
compose_file=compose_path,
)
except Exception as exc: # noqa: BLE001
log.exception("agent.mutate_async failed decky=%s", decky_id)
err = f"{type(exc).__name__}: {exc}"
await push_lifecycle_delta([_delta("failed", err)])
return
await push_lifecycle_delta([_delta("succeeded")])
async def teardown(decky_id: str | None = None) -> None:
log.info("agent.teardown decky_id=%s", decky_id)
await asyncio.to_thread(_deployer.teardown, decky_id)
@@ -194,7 +288,7 @@ async def self_destruct() -> None:
argv = ["/bin/bash", path]
spawn_kwargs = {"start_new_session": True}
subprocess.Popen( # nosec B603
subprocess.Popen( # type: ignore[call-overload] # nosec B603
argv,
stdin=subprocess.DEVNULL,
stdout=subprocess.DEVNULL,

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Agent → master liveness heartbeat loop.
Every ``INTERVAL_S`` seconds the worker posts ``executor.status()`` to
@@ -50,7 +51,11 @@ def _resolve_agent_dir() -> pathlib.Path:
return pki.DEFAULT_AGENT_DIR
async def _tick(client: httpx.AsyncClient, url: str, host_uuid: str, agent_version: str) -> None:
async def _build_body(
host_uuid: str,
agent_version: str,
lifecycle: Optional[list[dict]] = None,
) -> dict:
snap = await _exec.status()
body: dict = {
"host_uuid": host_uuid,
@@ -70,7 +75,13 @@ async def _tick(client: httpx.AsyncClient, url: str, host_uuid: str, agent_versi
store.close()
except Exception:
log.debug("heartbeat: topology state unavailable", exc_info=True)
if lifecycle:
body["lifecycle"] = lifecycle
return body
async def _tick(client: httpx.AsyncClient, url: str, host_uuid: str, agent_version: str) -> None:
body = await _build_body(host_uuid, agent_version)
resp = await client.post(url, json=body)
# 403 / 404 are terminal-ish — we still keep looping because an
# operator may re-enrol the host mid-session, but we log loudly so
@@ -121,7 +132,7 @@ def start() -> Optional[asyncio.Task]:
return None
try:
from decnet import __version__ as _v
from decnet import __version__ as _v # type: ignore[attr-defined]
agent_version = _v
except Exception:
agent_version = "unknown"
@@ -134,6 +145,59 @@ def start() -> Optional[asyncio.Task]:
return _task
async def push_lifecycle_delta(deltas: list[dict]) -> None:
"""Fire a one-off heartbeat POST carrying *deltas* in the
``lifecycle`` field. Each delta: ``{decky_name, operation, status,
error?, completed_at?}``.
Called by the agent executor on /deploy and /mutate completion so
the master observes the terminal transition immediately rather than
waiting up to ``INTERVAL_S`` for the next scheduled tick. Failures
are logged and swallowed; the next scheduled heartbeat carries the
same deltas via DB-side reconciliation, since the worker has no
durable per-row state to lose.
"""
from decnet.env import (
DECNET_HOST_UUID,
DECNET_MASTER_HOST,
DECNET_SWARMCTL_PORT,
)
if not deltas:
return
if not DECNET_HOST_UUID or not DECNET_MASTER_HOST:
log.debug("push_lifecycle_delta: identity unconfigured — skipping")
return
agent_dir = _resolve_agent_dir()
try:
ssl_ctx = build_worker_ssl_context(agent_dir)
except Exception:
log.exception("push_lifecycle_delta: SSL context unavailable")
return
try:
from decnet import __version__ as _v # type: ignore[attr-defined]
agent_version = _v
except Exception:
agent_version = "unknown"
url = f"https://{DECNET_MASTER_HOST}:{DECNET_SWARMCTL_PORT}/swarm/heartbeat"
try:
async with httpx.AsyncClient(verify=ssl_ctx, timeout=_TIMEOUT) as client:
body = await _build_body(
DECNET_HOST_UUID, agent_version, lifecycle=deltas,
)
resp = await client.post(url, json=body)
if resp.status_code not in (200, 204):
log.warning(
"lifecycle delta push rejected status=%d body=%s",
resp.status_code, resp.text[:200],
)
except Exception:
log.exception("push_lifecycle_delta failed — next scheduled tick will retry")
async def stop() -> None:
global _task
if _task is None:

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Worker-agent uvicorn launcher.
Starts ``decnet.agent.app:app`` over HTTPS with mTLS enforcement. The

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Agent-side topology apply/teardown/state primitives.
Wraps the compose + bridge machinery from :mod:`decnet.engine.deployer`
@@ -28,6 +29,7 @@ from decnet.engine.deployer import (
_compose_with_retry,
_teardown_order,
_topology_compose_path,
_topology_compose_project,
)
from decnet.logging import get_logger
from decnet.network import create_bridge_network, remove_bridge_network
@@ -59,6 +61,77 @@ def _topology_id(hydrated: dict[str, Any]) -> str:
return str(tid)
def _check_hash_and_validate(hydrated: dict[str, Any], version_hash: str) -> str:
"""Verify hash integrity and structural validity; return topology_id."""
local_hash = canonical_hash(hydrated)
if local_hash != version_hash:
raise HashMismatch(
f"master hash {version_hash!r} does not match agent hash "
f"{local_hash!r} — refusing to apply"
)
issues = _validate_topology(hydrated)
if _validation_errors(issues):
raise ValidationError(issues)
return _topology_id(hydrated)
async def _teardown_superseded(topology_id: str, store: TopologyStore) -> None:
"""Tear down the current topology if it differs from topology_id.
Master is authoritative — a different pinned topology (fully applied,
partially applied, or drifted) is torn down before the new apply proceeds.
Refusing with 409 would leave the agent stuck in a state only a human
could resolve.
"""
existing = store.current()
if existing is None or existing.topology_id == topology_id:
return
log.info(
"superseding topology %s with %s on master authority",
existing.topology_id, topology_id,
)
try:
await teardown(existing.topology_id, store)
except Exception as exc: # noqa: BLE001 — we still want to try applying
log.warning(
"best-effort teardown of superseded topology %s failed: %s",
existing.topology_id, exc,
)
# Hard-clear the store row so the new apply isn't blocked by a
# half-torn-down predecessor. Leftover docker objects surface via
# the next heartbeat's observed block.
store.clear(existing.topology_id)
def _materialise(hydrated: dict[str, Any], topology_id: str) -> None:
"""Create bridge networks, write compose file, and bring up containers.
Sync/blocking — callers must dispatch via asyncio.to_thread.
``--always-recreate-deps`` keeps service containers' netns shares
fresh: every decky service joins its base's netns via
``network_mode: container:<base>``, and that share is bound at
service start time. If a base is recreated (e.g. when ``ports:``
changes after toggling ``forwards_l3``) but compose decides the
services are unchanged, the services keep a stale netns FD
pointing at the destroyed base — they end up in an empty
namespace with only ``lo``, and external traffic hits a closed
port on the live base. Forcing dependents to recreate alongside
the base is the cheapest way to make this race impossible.
"""
compose_path = _topology_compose_path(topology_id)
compose_project = _topology_compose_project(topology_id)
client = docker.from_env()
for lan in hydrated["lans"]:
net_name = _topology_network_name(topology_id, lan["name"])
create_bridge_network(client, net_name, lan["subnet"], internal=not lan["is_dmz"])
write_topology_compose(hydrated, compose_path)
_compose_with_retry(
"up", "--build", "-d", "--always-recreate-deps",
compose_file=compose_path, project=compose_project,
)
async def apply(
hydrated: dict[str, Any],
version_hash: str,
@@ -73,76 +146,11 @@ async def apply(
Any docker / compose error propagates up; the endpoint maps it
to 500 and records the message on the store row.
"""
local_hash = canonical_hash(hydrated)
if local_hash != version_hash:
raise HashMismatch(
f"master hash {version_hash!r} does not match agent hash "
f"{local_hash!r} — refusing to apply"
)
issues = _validate_topology(hydrated)
if _validation_errors(issues):
raise ValidationError(issues)
topology_id = _topology_id(hydrated)
# Master is authoritative. If a different topology is pinned here
# — whether it fully applied, only partially applied (failure
# marker row + orphan containers), or drifted — teardown first,
# then accept the new one. Refusing with 409 would leave the
# agent stuck in a state only a human could resolve.
existing = store.current()
if existing is not None and existing.topology_id != topology_id:
log.info(
"superseding topology %s with %s on master authority",
existing.topology_id, topology_id,
)
try:
await teardown(existing.topology_id, store)
except Exception as exc: # noqa: BLE001 — we still want to try applying
log.warning(
"best-effort teardown of superseded topology %s failed: %s",
existing.topology_id, exc,
)
# Hard-clear the store row so the new apply isn't blocked
# by a half-torn-down predecessor. Leftover docker objects
# will surface via the next heartbeat's observed block.
store.clear(existing.topology_id)
lans = hydrated["lans"]
compose_path = _topology_compose_path(topology_id)
client = docker.from_env()
# Bridges + compose are sync/blocking; hop to a thread so we don't
# stall the event loop on a slow docker daemon.
def _materialise() -> None:
for lan in lans:
net_name = _topology_network_name(topology_id, lan["name"])
internal = not lan["is_dmz"]
create_bridge_network(
client, net_name, lan["subnet"], internal=internal
)
write_topology_compose(hydrated, compose_path)
# ``--always-recreate-deps`` keeps service containers' netns shares
# fresh: every decky service joins its base's netns via
# ``network_mode: container:<base>``, and that share is bound at
# service start time. If a base is recreated (e.g. when ``ports:``
# changes after toggling ``forwards_l3``) but compose decides the
# services are unchanged, the services keep a stale netns FD
# pointing at the destroyed base — they end up in an empty
# namespace with only ``lo``, and external traffic hits a closed
# port on the live base. Forcing dependents to recreate alongside
# the base is the cheapest way to make this race impossible.
_compose_with_retry(
"up", "--build", "-d", "--always-recreate-deps",
compose_file=compose_path,
)
await asyncio.to_thread(_materialise)
topology_id = _check_hash_and_validate(hydrated, version_hash)
await _teardown_superseded(topology_id, store)
await asyncio.to_thread(_materialise, hydrated, topology_id)
store.put(topology_id, version_hash, hydrated)
log.info(
"topology %s applied on agent (%d LANs)", topology_id, len(lans)
)
log.info("topology %s applied on agent (%d LANs)", topology_id, len(hydrated["lans"]))
async def teardown(
@@ -158,12 +166,16 @@ async def teardown(
# LAN membership list via the hydrated blob if available.
hydrated = row.hydrated if row and row.topology_id == topology_id else None
compose_path = _topology_compose_path(topology_id)
compose_project = _topology_compose_project(topology_id)
client = docker.from_env()
def _dismantle() -> None:
if compose_path.exists():
try:
_compose("down", "--remove-orphans", compose_file=compose_path)
_compose(
"down", "--remove-orphans",
compose_file=compose_path, project=compose_project,
)
except subprocess.CalledProcessError as exc:
log.warning(
"topology %s compose down failed (continuing): %s",

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Agent-side sqlite cache of the currently-applied topology.
**This is a cache, not a source of truth.** The master is the only
@@ -63,6 +64,7 @@ class TopologyStore:
# The agent is single-process, so there's no real contention —
# sqlite's own connection lock is enough.
self._conn = sqlite3.connect(str(db_path), check_same_thread=False)
self._conn.row_factory = sqlite3.Row
self._conn.execute(
"CREATE TABLE IF NOT EXISTS applied_topology ("
" topology_id TEXT PRIMARY KEY,"
@@ -84,11 +86,11 @@ class TopologyStore:
if row is None:
return None
return AppliedRow(
topology_id=row[0],
applied_version_hash=row[1],
hydrated=json.loads(row[2]),
applied_at=int(row[3]),
last_error=row[4],
topology_id=row["topology_id"],
applied_version_hash=row["applied_version_hash"],
hydrated=json.loads(row["hydrated_blob_json"]),
applied_at=int(row["applied_at"]),
last_error=row["last_error"],
)
# ---------------------------------------------------------------- writes

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""
Machine archetype profiles for DECNET deckies.

View File

@@ -0,0 +1 @@
"""Artifact storage helpers shared between the web router and TTP workers."""

86
decnet/artifacts/paths.py Normal file
View File

@@ -0,0 +1,86 @@
"""
Shared on-disk artifact path resolution.
Honeypot decoys (SSH, SMTP) farm captured payloads into a host-mounted
quarantine tree:
/var/lib/decnet/artifacts/{decky}/{service}/{stored_as}
Two callers need to translate ``(decky, stored_as, service)`` into a
concrete ``Path`` rooted under that tree:
* The web router endpoint ``GET /api/v1/artifacts/{decky}/{stored_as}``
(``decnet.web.router.artifacts.api_get_artifact``) — admin-gated
download for the dashboard.
* The TTP ``EmailLifter`` (``decnet.ttp.impl.email_lifter``), which
reads the stored ``.eml`` at tag-time so body-aware predicates
(R0047 BEC, R0048 macro) don't need raw body text on the bus.
Both callers share the same validation rules and the same
defence-in-depth symlink-escape check; this module is the single
implementation. It is auth-agnostic — wrappers layer authentication
where appropriate (the router does ``require_admin``, the lifter does
not).
"""
from __future__ import annotations
import os
import re
from pathlib import Path
# decky names come from the deployer — lowercase alnum plus hyphens.
_DECKY_RE = re.compile(r"^[a-z0-9][a-z0-9-]{0,62}$")
# Services that own an artifacts subdir. Kept explicit so a caller
# can't pivot into arbitrary subpaths via a query string or bus payload.
_ALLOWED_SERVICES = frozenset({"ssh", "smtp"})
# stored_as is assembled by the capturing template as:
# ${ts}_${sha:0:12}_${base}
# where ts is ISO-8601 UTC (e.g. 2026-04-18T02:22:56Z), sha is 12 hex chars,
# and base is the original filename's basename. Keep the filename charset
# tight but allow common punctuation dropped files actually use.
_STORED_AS_RE = re.compile(
r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z_[a-f0-9]{12}_[A-Za-z0-9._-]{1,255}$"
)
# Module-level so tests can monkeypatch. Override via env in production
# (the systemd unit sets this) — the prod path matches the bind mount
# declared in decnet/services/{ssh,smtp}.py.
ARTIFACTS_ROOT = Path(
os.environ.get("DECNET_ARTIFACTS_ROOT", "/var/lib/decnet/artifacts")
)
class ArtifactPathError(ValueError):
"""Raised when (decky, stored_as, service) fails validation or escapes
the artifacts root.
The router catches this and re-raises HTTPException(400). The lifter
catches it and treats the event as having no body available (no-tag).
"""
def resolve_artifact_path(decky: str, stored_as: str, service: str) -> Path:
"""Validate inputs, resolve the on-disk path, and confirm it stays
inside the artifacts root.
Raises :class:`ArtifactPathError` on any violation. Does NOT check
that the file exists — callers handle that distinctly (404 for the
router, no-tag for the lifter).
"""
if service not in _ALLOWED_SERVICES:
raise ArtifactPathError("invalid service")
if not _DECKY_RE.fullmatch(decky):
raise ArtifactPathError("invalid decky name")
if not _STORED_AS_RE.fullmatch(stored_as):
raise ArtifactPathError("invalid stored_as")
root = ARTIFACTS_ROOT.resolve()
candidate = (root / decky / service / stored_as).resolve()
# defence-in-depth: even though the regexes reject `..`, make sure a
# symlink or weird filesystem state can't escape the root.
if root not in candidate.parents and candidate != root:
raise ArtifactPathError("path escapes artifacts root")
return candidate

129
decnet/artifacts/shards.py Normal file
View File

@@ -0,0 +1,129 @@
"""Shared asciinema shard helpers.
Extracted from ``decnet/web/router/transcripts/api_get_transcript.py``
so non-router callers (the BEHAVE-SHELL session-ended handler in
``decnet/profiler/worker.py``, the collector's session aggregator)
can resolve shard paths without crossing the layer boundary into the
FastAPI router.
Functions here speak in :class:`ValueError` — callers that want HTTP
semantics translate at the boundary. The router wrappers keep their
existing ``HTTPException`` behaviour for backwards compatibility.
PII boundary unchanged: shards live on disk; this module returns
:class:`pathlib.Path` pointers, never byte content. The ``_get_index``
cache stores byte offsets only.
"""
from __future__ import annotations
import os
import re
from collections import OrderedDict
from pathlib import Path
ARTIFACTS_ROOT = Path(
os.environ.get("DECNET_ARTIFACTS_ROOT", "/var/lib/decnet/artifacts"),
)
_DECKY_RE = re.compile(r"^[a-z0-9][a-z0-9-]{0,62}$")
_SERVICE_RE = re.compile(r"^(ssh|telnet)$")
_SHARD_BASENAME_RE = re.compile(r"^sessions-\d{4}-\d{2}-\d{2}\.jsonl$")
_SID_LINE_RE = re.compile(rb'"sid"\s*:\s*"([a-f0-9-]{36})"')
# (path, mtime_ns) → {sid: [(offset, length), ...]}
_INDEX_CACHE: "OrderedDict[tuple[str, int], dict[str, list[tuple[int, int]]]]" = (
OrderedDict()
)
_CACHE_MAX = 32
def validate_names(decky: str, service: str) -> None:
"""Raise :class:`ValueError` if ``decky`` / ``service`` look forged."""
if not _DECKY_RE.fullmatch(decky):
raise ValueError(f"invalid decky name: {decky!r}")
if not _SERVICE_RE.fullmatch(service):
raise ValueError(f"invalid service: {service!r}")
def resolve_shard(decky: str, service: str, shard_name: str) -> Path:
"""Resolve ``ARTIFACTS_ROOT/{decky}/{service}/transcripts/{shard_name}``
with escape-attempt detection. Raises :class:`ValueError` on
invalid inputs.
"""
validate_names(decky, service)
if not _SHARD_BASENAME_RE.fullmatch(shard_name):
raise ValueError(f"invalid shard name: {shard_name!r}")
root = ARTIFACTS_ROOT.resolve()
candidate = (root / decky / service / "transcripts" / shard_name).resolve()
if root not in candidate.parents and candidate != root:
raise ValueError(f"path escapes artifacts root: {candidate}")
return candidate
def _build_index(path: Path) -> dict[str, list[tuple[int, int]]]:
index: dict[str, list[tuple[int, int]]] = {}
with path.open("rb") as f:
offset = 0
for line in f:
length = len(line)
m = _SID_LINE_RE.search(line)
if m:
sid = m.group(1).decode("ascii")
index.setdefault(sid, []).append((offset, length))
offset += length
return index
def get_index(path: Path) -> tuple[dict[str, list[tuple[int, int]]], int]:
"""Return ``(sid → [(offset, length), …], file_size)``.
Cached by ``(path, mtime_ns)``; rebuilt when the shard changes.
"""
st = path.stat()
key = (str(path), st.st_mtime_ns)
if key in _INDEX_CACHE:
_INDEX_CACHE.move_to_end(key)
return _INDEX_CACHE[key], st.st_size
index = _build_index(path)
_INDEX_CACHE[key] = index
_INDEX_CACHE.move_to_end(key)
while len(_INDEX_CACHE) > _CACHE_MAX:
_INDEX_CACHE.popitem(last=False)
return index, st.st_size
def find_shard_with_sid(decky: str, service: str, sid: str) -> Path | None:
"""Scan every ``sessions-YYYY-MM-DD.jsonl`` under the decky's
transcripts dir until one claims this ``sid``.
Newest shards first — most lookups are for recent sessions. Caches
the per-shard sid index, so repeated calls are ~free until the
shard's mtime changes.
Returns ``None`` when nothing claims the sid OR when the
transcripts dir is missing / unreadable. Never raises on
filesystem-level errors — callers treat ``None`` as "skip".
"""
validate_names(decky, service)
root = ARTIFACTS_ROOT.resolve()
transcripts_dir = (root / decky / service / "transcripts").resolve()
if root not in transcripts_dir.parents:
return None
try:
if not transcripts_dir.is_dir():
return None
entries = list(transcripts_dir.iterdir())
except (OSError, PermissionError):
return None
shards = sorted(
(p for p in entries if _SHARD_BASENAME_RE.fullmatch(p.name)),
reverse=True,
)
for shard in shards:
try:
index, _size = get_index(shard)
except (OSError, PermissionError):
continue
if sid in index:
return shard
return None

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""
IP-to-ASN enrichment — maps attacker IPs to BGP-announced AS numbers and
org names for attacker intelligence.
@@ -6,7 +7,7 @@ Public surface mirrors :mod:`decnet.geoip` so callers can compose them:
* :func:`get_lookup` — returns the singleton :class:`AsnLookup`.
* :func:`enrich_ip` — takes an IP string, returns
``(asn_int, asn_name, provider_name)`` or ``(None, None, None)``.
``(asn_int, asn_name, bgp_prefix, provider_name)`` or ``(None, None, None, None)``.
Provider selection goes through :func:`~decnet.asn.factory.get_provider`
(env ``DECNET_ASN_PROVIDER``, default ``iptoasn``). Direct imports of
@@ -51,8 +52,8 @@ def get_lookup(*, force_refresh: bool = False) -> AsnLookup:
return _lookup
def enrich_ip(ip: str) -> Tuple[Optional[int], Optional[str], Optional[str]]:
"""Return ``(asn, as_name, provider_name)`` or ``(None, None, None)``.
def enrich_ip(ip: str) -> Tuple[Optional[int], Optional[str], Optional[str], Optional[str]]:
"""Return ``(asn, as_name, bgp_prefix, provider_name)`` or ``(None, None, None, None)``.
Never raises — any lookup failure collapses to all-None so the
caller (profiler) can upsert the attacker row regardless.
@@ -62,15 +63,15 @@ def enrich_ip(ip: str) -> Tuple[Optional[int], Optional[str], Optional[str]]:
touching provider config.
"""
if os.environ.get("DECNET_ASN_ENABLED", "true").lower() == "false":
return (None, None, None)
return (None, None, None, None)
try:
lookup = get_lookup()
info = lookup.asn(ip)
if info is None:
return (None, None, None)
return (info.asn, info.name or None, _provider_name or "unknown")
return (None, None, None, None)
return (info.asn, info.name or None, info.prefix, _provider_name or "unknown")
except Exception:
return (None, None, None)
return (None, None, None, None)
def _files_stale(provider) -> bool:

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""ASN provider protocol — mirror of :mod:`decnet.geoip.base`.
Concrete providers (e.g. :mod:`decnet.asn.iptoasn`) implement this.

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""ASN provider factory — mirror of :mod:`decnet.geoip.factory`.
Dispatch key: ``DECNET_ASN_PROVIDER`` (default ``iptoasn``). Lazy

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""iptoasn.com IP→ASN provider.
Daily-refreshed gzipped TSV dump of the global BGP table, derived from

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""iptoasn.com bulk dump download.
One file: ``ip2asn-v4.tsv.gz``, ~5 MB compressed, refreshed daily.

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Parser for the iptoasn.com ``ip2asn-v4.tsv`` dump.
Line shape (gzipped, one row per BGP-announced prefix)::

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""iptoasn provider — orchestrates fetch + parse into an :class:`AsnLookup`.
Mirrors :class:`decnet.geoip.rir.provider.RirProvider` exactly: fetch,
@@ -13,7 +14,7 @@ from typing import Sequence
from decnet.asn.base import Provider
from decnet.asn.iptoasn.fetch import IPTOASN_SOURCES, fetch_all
from decnet.asn.iptoasn.parse import parse_file
from decnet.asn.lookup import AsnLookup
from decnet.asn.lookup import AsnLookup, Range
from decnet.asn.paths import ensure_root
logger = logging.getLogger("decnet.asn.iptoasn.provider")
@@ -54,7 +55,7 @@ class IptoasnProvider(Provider):
"asn.iptoasn: cache load failed, rebuilding: %s", exc
)
ranges = []
ranges: list[Range] = []
for path in self.data_paths():
if not path.exists():
continue

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Provider-agnostic IP→ASN lookup.
A :class:`AsnLookup` is a frozen, sorted array of ``(start_ip,
@@ -23,11 +24,25 @@ class AsnInfo:
asn: int
name: str # AS description / org name; "" if absent in the source data
prefix: Optional[str] = None # synthesized covering CIDR; set at lookup time, not at rest
Range = Tuple[int, int, AsnInfo]
def _synthesize_prefix(start_int: int, end_int: int, queried_int: int) -> Optional[str]:
"""Return the most-specific CIDR from [start, end] that contains queried_int."""
try:
for net in ipaddress.summarize_address_range(
ipaddress.IPv4Address(start_int), ipaddress.IPv4Address(end_int)
):
if queried_int >= int(net.network_address) and queried_int <= int(net.broadcast_address):
return str(net)
except (ValueError, TypeError):
pass
return None
@dataclass
class AsnLookup:
"""Indexed AS lookup over IPv4 ranges."""
@@ -88,7 +103,9 @@ class AsnLookup:
if idx < 0:
return None
if n <= self._ends[idx]:
return self._infos[idx]
info = self._infos[idx]
prefix = _synthesize_prefix(self._starts[idx], self._ends[idx], n)
return AsnInfo(asn=info.asn, name=info.name, prefix=prefix)
return None
def __len__(self) -> int:

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Filesystem layout for ASN data — mirror of :mod:`decnet.geoip.paths`.
``ASN_ROOT`` is where providers drop their raw files and cache indexes.

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""DECNET ServiceBus — pub/sub notification substrate.
The bus is the notification layer for DECNET's worker constellation. The DB

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Process-wide bus singleton for request-serving workers (API, SSE routes).
A single connected :class:`~decnet.bus.base.BaseBus` shared across request

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Bus abstractions: the :class:`Event` envelope and the :class:`BaseBus` ABC.
Every transport (NATS, in-process fake, null) speaks this contract. The

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Bus factory — selects a :class:`~decnet.bus.base.BaseBus` implementation.
Dispatch key: the ``DECNET_BUS_TYPE`` environment variable.
@@ -76,7 +77,7 @@ def _maybe_wrap_telemetry(bus: BaseBus) -> BaseBus:
up at all we no-op.
"""
try:
from decnet.telemetry import wrap_repository # type: ignore[attr-defined]
from decnet.telemetry import wrap_repository
except ImportError:
return bus
try:

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""In-process bus transports.
* :class:`FakeBus` — real pub/sub semantics without touching a socket. Used

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Wire protocol for the DECNET bus UNIX-socket transport.
Frame layout:

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Fire-and-forget publish helpers shared across every worker.
Lifted out of ``decnet/mutator/engine.py`` once a second caller showed up
@@ -58,7 +59,7 @@ def make_thread_safe_publisher(
contract the rest of this module already upholds.
"""
if bus is None:
return lambda _topic, _payload, _event_type="": None
return lambda _topic, _payload, _event_type="": None # type: ignore[misc]
def _publish(topic: str, payload: dict[str, Any], event_type: str = "") -> None:
# Stream threads may keep draining after the bus owner closed it

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Canonical topic hierarchy for the DECNET ServiceBus.
Locked early so consumers can subscribe with stable wildcard patterns.
@@ -17,6 +18,7 @@ Token structure (NATS-style, dot-separated):
attacker.scored
attacker.session.started
attacker.session.ended
attacker.observation.{primitive}
identity.formed
identity.observation.linked
identity.merged
@@ -28,12 +30,18 @@ Token structure (NATS-style, dot-separated):
campaign.unmerged
credential.captured
credential.reuse.detected
attribution.profile.state_changed
attribution.profile.multi_actor_suspected
canary.{token_id}.triggered
canary.{token_id}.placed
canary.{token_id}.revoked
system.log
system.bus.health
system.{worker}.health
email.received
ttp.tagged
ttp.rule.fired.{technique_id}
ttp.rule.suppressed
Wildcards (per :func:`decnet.bus.base.matches`):
@@ -52,8 +60,12 @@ IDENTITY = "identity"
CAMPAIGN = "campaign"
SYSTEM = "system"
CREDENTIAL = "credential"
ATTRIBUTION = "attribution"
ORCHESTRATOR = "orchestrator"
CANARY = "canary"
SMTP = "smtp"
EMAIL = "email"
TTP = "ttp"
# ─── Leaf event-type constants (the last segment of each topic) ──────────────
@@ -83,6 +95,24 @@ DECKY_MUTATE_REQUEST = "mutate_request"
# syslog sidechannel too) to interleave substrate-change markers into
# attacker traversals.
DECKY_MUTATION = "mutation"
# Per-service add/remove on a deployed decky (live; no full redeploy).
# Payload carries ``decky_name``, ``service_name``, optional
# ``topology_id``, and ``services`` (the post-mutation list). Consumers
# that watch substrate shape (correlator, dashboard, profiler) reconcile
# off these without waiting for the next decnet-state.json snapshot.
DECKY_SERVICE_ADDED = "service_added"
DECKY_SERVICE_REMOVED = "service_removed"
# Per-service config change (the schema-driven Inspector form). Payload
# carries ``decky_name``, ``service_name``, optional ``topology_id``,
# ``service_config`` (the new validated dict), and ``recreated`` — true
# when the operator hit Apply (container was force-recreated to pick up
# the new env), false when they only hit Save (DB-only).
DECKY_SERVICE_CONFIG_CHANGED = "service_config_changed"
# Async deploy/mutate operation transitions
# (pending/running/succeeded/failed). Payload: {lifecycle_id, operation,
# status, error?}. UI polling endpoint is the source of truth; this
# fires for live subscribers (dashboard, mutator-side audit, etc).
DECKY_LIFECYCLE = "lifecycle"
# Attacker event types (second token under the ``attacker`` root). First
# sighting, session boundary transitions, and score-threshold crossings
@@ -90,10 +120,27 @@ DECKY_MUTATION = "mutation"
# the wildcard ``attacker.>``.
ATTACKER_OBSERVED = "observed"
ATTACKER_SCORED = "scored"
# Published once per successful active probe result (JARM/HASSH/TCPfp).
# Published once per successful active probe result (JARM/HASSH/TCPfp/ipv6_leak).
# Distinct from ``observed`` which is the correlator's first-sight signal —
# a fingerprint is additional evidence about an already-observed attacker.
# Known payload ``kind`` discriminators carried in this topic:
# "jarm" — JARM TLS server hash (prober)
# "hassh" — HASSHServer SSH key-exchange hash (prober)
# "tcpfp" — TCP/IP stack fingerprint hash (prober)
# "tls_cert" — leaf TLS certificate SHA-256 (prober)
# "ipv6_leak" — fe80:: link-local address observed via passive sniffer
# or active ICMPv6 solicitation (prober + sniffer);
# payload: {attacker_ip, addr, iid_kind, mac_oui, vector,
# on_iface, observed_at}
ATTACKER_FINGERPRINTED = "fingerprinted"
# Published when the prober observes a NEW hash for an
# (attacker_ip, port, probe_type) triple it has seen before — i.e. the
# attacker rotated their VPS, rebuilt their SSH server, swapped their
# TLS cert. Distinct from ``fingerprinted`` which fires on every probe
# result; ``fingerprint_rotated`` fires only on diff and carries both
# old_hash + new_hash. Producer: prober (via the rotation library);
# consumers: dashboard, forensics, attribution clustering.
ATTACKER_FINGERPRINT_ROTATED = "fingerprint_rotated"
ATTACKER_SESSION_STARTED = "session.started"
ATTACKER_SESSION_ENDED = "session.ended"
# Published by the ``decnet enrich`` worker after an enrichment pass
@@ -101,6 +148,19 @@ ATTACKER_SESSION_ENDED = "session.ended"
# returned a verdict). Payload carries the aggregate verdict + per-
# provider summary so SIEM-bound webhooks don't need to re-query the DB.
ATTACKER_INTEL_ENRICHED = "intel.enriched"
# Per-primitive BEHAVE-SHELL observation. Full topic shape:
# attacker.observation.<primitive>
# e.g. ``attacker.observation.motor.input_modality``. Producer:
# ``decnet/profiler/behave_shell/`` (extractor library called from the
# profiler worker on ``attacker.session.ended``); consumers: dashboard
# SSE relay, attribution engine state machine, federation gossip
# (post-v0). See development/BEHAVE-INTEGRATION.md §"Bus topics" for
# the wire-format contract — the prefix is documentation + pattern
# match only; bus auth is socket file perms (DEBT-029 §2), not
# topic-level. The ``primitive`` segment MAY contain dots
# (``motor.shell_mastery.tab_completion``) — the same dotted-leaf
# rule that ``attacker.session.ended`` uses.
ATTACKER_OBSERVATION_PREFIX = "observation"
# Identity-resolution event types (second/third tokens under ``identity``).
# Published by the (future) clusterer worker — see
@@ -168,6 +228,42 @@ CAMPAIGN_UNMERGED = "unmerged"
CREDENTIAL_CAPTURED = "captured"
CREDENTIAL_REUSE_DETECTED = "reuse.detected"
# Attribution-engine event types (second/third tokens under
# ``attribution``). Published by the v0 attribution worker
# (``decnet.correlation.attribution_worker``) which subscribes to
# ``attacker.observation.>`` and runs the per-(identity, primitive)
# state machine. See ``development/ATTRIBUTION-ENGINE.md``.
#
# attribution.profile.state_changed — per-primitive state
# transition (e.g.
# stable → drifting).
# Payload: identity_uuid,
# primitive, old_state,
# new_state, current_value,
# confidence,
# observation_count, ts.
# attribution.profile.multi_actor_suspected — fires when ≥ 2
# primitives flag the same
# identity as multi_actor
# concurrently. Cross-
# primitive correlator;
# single-primitive
# multi_actor is too noisy
# on its own. Payload:
# identity_uuid, primitives,
# evidence_summary,
# confidence, ts.
#
# These are *derived* signals — distinct from
# ``identity.*`` (clusterer lifecycle, IDENTITY_RESOLUTION.md) and
# ``attacker.observation.*`` (raw extractor envelopes,
# BEHAVE-INTEGRATION.md). The three families compose: observations feed
# the attribution engine, the engine emits derived state, the clusterer
# reads observations + state to form / merge identities.
ATTRIBUTION_PROFILE_PREFIX = "profile"
ATTRIBUTION_PROFILE_STATE_CHANGED = "profile.state_changed"
ATTRIBUTION_PROFILE_MULTI_ACTOR_SUSPECTED = "profile.multi_actor_suspected"
# Canary-token event types (third token under ``canary``).
#
# canary.{token_id}.placed — orchestrator/API successfully planted a
@@ -231,6 +327,43 @@ WORKER_CONTROL_START = "start"
# of patterns. Payload is currently empty; consumers only need the signal.
WEBHOOK_SUBSCRIPTIONS_CHANGED = "system.webhook.subscriptions_changed"
# Email-receipt event — fired by smtp / smtp-relay services on full-message
# receipt (envelope + headers + body + attachments captured). Single-token
# leaf so the bus tokenizer accepts it directly under the ``email`` root.
# Consumed by the TTP ``email_lifter`` for header / body-pattern / attachment
# rules. PII rule (TTP_TAGGING.md "Hard parts §6"): payload carries hashes,
# counts, header names, and rcpt-domain sets — never rcpt addresses or body
# bytes.
EMAIL_RECEIVED = "received"
# TTP-tagging event types (second/third tokens under ``ttp``).
#
# ttp.tagged — one or more new tags written. Published
# only when ``INSERT OR IGNORE`` wrote at
# least one new row; idempotent
# re-evaluations publish nothing
# (loop-prevention invariant — see
# TTP_TAGGING.md).
# ttp.rule.fired.{technique_id} — per-technique fan-out for SIEM
# consumers that subscribe to a single
# technique. Topic key is the parent
# technique; sub_technique is in the
# payload. Built via :func:`ttp_rule_fired`.
# ttp.rule.suppressed — rule fired but the tag was dropped
# (confidence below floor, rate-limited,
# or the rule's RuleState was disabled).
# Observability signal for the dashboard.
#
# Per-rule reload + state-change topics. Built via
# :func:`ttp_rule_reloaded` / :func:`ttp_rule_state`; SIEM consumers
# subscribe to ``ttp.rule.reloaded.>`` (every rule) or
# ``ttp.rule.reloaded.R0001`` (one rule) at their preferred granularity.
TTP_TAGGED = "tagged"
TTP_RULE_FIRED = "rule.fired"
TTP_RULE_SUPPRESSED = "rule.suppressed"
TTP_RULE_RELOADED = "rule.reloaded"
TTP_RULE_STATE = "rule.state"
# ─── Builders ────────────────────────────────────────────────────────────────
@@ -264,6 +397,12 @@ def decky_mutation(decky_id: str) -> str:
return f"{DECKY}.{decky_id}.{DECKY_MUTATION}"
def decky_lifecycle(decky_id: str) -> str:
"""Build ``decky.<id>.lifecycle``."""
_reject_tokens(decky_id)
return f"{DECKY}.{decky_id}.{DECKY_LIFECYCLE}"
def system(event_type: str) -> str:
"""Build ``system.<event_type>``.
@@ -301,6 +440,42 @@ def attacker(event_type: str) -> str:
return f"{ATTACKER}.{event_type}"
def attacker_observation(primitive: str) -> str:
"""Build ``attacker.observation.<primitive>``.
*primitive* is the fully-qualified BEHAVE-SHELL primitive path
(e.g. ``motor.input_modality``,
``cognitive.feedback_loop_engagement``,
``motor.shell_mastery.tab_completion``). Dotted primitives are
permitted — this matches the format
``behave_shell.spec.event_adapter.event_topic_for`` produces
upstream, and DECNET's bus admits the dotted leaf the same way
:func:`attacker` does for ``session.started``.
Empty string is rejected so a downstream typo doesn't ship as
``attacker.observation.``.
"""
if not primitive:
raise ValueError(
"attacker_observation topic requires a non-empty primitive",
)
return f"{ATTACKER}.{ATTACKER_OBSERVATION_PREFIX}.{primitive}"
def attribution(event_type: str) -> str:
"""Build ``attribution.<event_type>``.
*event_type* is typically one of
:data:`ATTRIBUTION_PROFILE_STATE_CHANGED` or
:data:`ATTRIBUTION_PROFILE_MULTI_ACTOR_SUSPECTED` — both contain a
dot (``profile.state_changed``) which is permitted under the same
"trailing dotted leaf" rule that ``attacker.session.started`` uses.
"""
if not event_type:
raise ValueError("attribution topic requires a non-empty event_type")
return f"{ATTRIBUTION}.{event_type}"
def campaign(event_type: str) -> str:
"""Build ``campaign.<event_type>``.
@@ -381,6 +556,86 @@ def system_control(worker: str) -> str:
return f"{SYSTEM}.{worker}.{SYSTEM_CONTROL}"
def smtp(event_type: str) -> str:
"""Build ``smtp.<event_type>``.
*event_type* may contain dots (e.g. ``probe.pending``).
"""
if not event_type:
raise ValueError("smtp topic requires a non-empty event_type")
return f"{SMTP}.{event_type}"
def email_topic(event_type: str) -> str:
"""Build ``email.<event_type>``.
Named ``email_topic`` rather than ``email`` to avoid shadowing the
Python ``email`` stdlib package at import sites that pull both.
*event_type* is typically :data:`EMAIL_RECEIVED`.
"""
if not event_type:
raise ValueError("email topic requires a non-empty event_type")
return f"{EMAIL}.{event_type}"
def ttp(event_type: str) -> str:
"""Build ``ttp.<event_type>``.
*event_type* is typically one of :data:`TTP_TAGGED`,
:data:`TTP_RULE_FIRED`, or :data:`TTP_RULE_SUPPRESSED`. Dotted
leaves (``rule.fired``) are permitted — same rationale as
:func:`system`. For per-technique fan-out use
:func:`ttp_rule_fired`.
"""
if not event_type:
raise ValueError("ttp topic requires a non-empty event_type")
return f"{TTP}.{event_type}"
def ttp_rule_fired(technique_id: str) -> str:
"""Build ``ttp.rule.fired.<technique_id>``.
Per-technique fan-out: SIEM subscribers can listen on
``ttp.rule.fired.>`` for everything, ``ttp.rule.fired.T1110`` for
one technique. *technique_id* is validated as a single segment —
sub-techniques like ``T1110.001`` are rejected because they would
split into two tokens. The topic key is the parent technique;
``sub_technique_id`` lives in the payload.
"""
_reject_tokens(technique_id)
return f"{TTP}.rule.fired.{technique_id}"
def ttp_rule_reloaded(rule_id: str) -> str:
"""Build ``ttp.rule.reloaded.<rule_id>``.
Per-rule fan-out fired by the :class:`~decnet.ttp.store.base.RuleStore`
when a rule's *definition* changes (YAML edit on the filesystem
backend, ``ttp_rule`` row update on the database backend). One event
per per-rule edit — never batched (the "incremental, never batched"
property in TTP_TAGGING.md §"Bus topics" inherits its granularity
from :meth:`RuleStore.subscribe_changes`).
Subscribers: ``ttp.rule.reloaded.>`` for every rule,
``ttp.rule.reloaded.R0001`` for one. *rule_id* is validated as a
single segment.
"""
_reject_tokens(rule_id)
return f"{TTP}.{TTP_RULE_RELOADED}.{rule_id}"
def ttp_rule_state(rule_id: str) -> str:
"""Build ``ttp.rule.state.<rule_id>``.
Per-rule fan-out fired by the :class:`~decnet.ttp.store.base.RuleStore`
when a rule's *operational state* changes (operator hits the disable
button, an ``expires_at`` TTL fires and auto-reverts the state).
*rule_id* is validated as a single segment.
"""
_reject_tokens(rule_id)
return f"{TTP}.{TTP_RULE_STATE}.{rule_id}"
def _reject_tokens(*parts: str) -> None:
"""Reject topic segments that would break NATS-style tokenization.

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""UNIX-socket client — :class:`UnixSocketBus` implementation of :class:`BaseBus`.
Holds one open socket to the local :class:`~decnet.bus.unix_server.BusServer`.

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""UNIX-socket server for the DECNET bus.
One :class:`BusServer` per host. Accepts local connections on a UNIX-domain

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""``decnet bus`` worker entrypoint.
Starts a :class:`~decnet.bus.unix_server.BusServer` on the configured UNIX

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Canary tokens — decoy artifacts planted in decky filesystems.
Public surface is exported here so callers can ``from decnet.canary

View File

@@ -0,0 +1,19 @@
// SPDX-License-Identifier: AGPL-3.0-or-later
// Node helper invoked by decnet.canary.obfuscator.
// Reads {code, options} JSON from stdin, writes obfuscated JS to stdout.
// Kept dependency-light on purpose: only javascript-obfuscator.
const JsObf = require('javascript-obfuscator');
let raw = '';
process.stdin.setEncoding('utf8');
process.stdin.on('data', (chunk) => { raw += chunk; });
process.stdin.on('end', () => {
try {
const { code, options } = JSON.parse(raw);
const result = JsObf.obfuscate(code, options || {});
process.stdout.write(result.getObfuscatedCode());
} catch (e) {
process.stderr.write(String(e && e.stack || e));
process.exit(2);
}
});

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Canary generator / instrumenter ABCs and the artifact dataclass.
Two flavors of producer share the same return shape:
@@ -100,6 +101,12 @@ class CanaryArtifact:
planting. Never leaked to the attacker-facing surface.
"""
fingerprint_nonce: Optional[str] = None
"""Per-mint HMAC nonce for fingerprint canaries; ``None`` for everything
else. Cultivator reads this and persists it on ``CanaryToken.fingerprint_nonce``
so the worker can validate incoming ``?k=`` params.
"""
class CanaryGenerator(ABC):
"""Produces a fake artifact from scratch."""

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Realism contract adapter for canary generators.
Stage 7 of the realism migration. The orchestrator's planner picks a
@@ -46,6 +47,8 @@ _CLASS_TO_GENERATOR: dict[ContentClass, str] = {
ContentClass.CANARY_HONEYDOC_DOCX: "honeydoc_docx",
ContentClass.CANARY_HONEYDOC_PDF: "honeydoc_pdf",
ContentClass.CANARY_MYSQL_DUMP: "mysql_dump",
ContentClass.CANARY_FINGERPRINT_HTML: "fingerprint_html",
ContentClass.CANARY_FINGERPRINT_SVG: "fingerprint_svg",
}
@@ -62,6 +65,8 @@ _GENERATOR_TO_KIND: dict[str, str] = {
"honeydoc_pdf": "http",
"ssh_key": "dns", # trip is DNS resolution of host comment
"mysql_dump": "dns", # trip is DNS resolution of subdomain
"fingerprint_html": "http", # obfuscated JS beacons GET /c/<slug>
"fingerprint_svg": "http", # same, embedded inside SVG <script>
}
@@ -78,6 +83,8 @@ _DEFAULT_PATH: dict[ContentClass, str] = {
ContentClass.CANARY_HONEYDOC_DOCX: "/home/{persona}/Documents/Q3-Operations-Review.docx",
ContentClass.CANARY_HONEYDOC_PDF: "/home/{persona}/Documents/Q3-Operations-Review.pdf",
ContentClass.CANARY_MYSQL_DUMP: "/var/backups/db_backup.sql",
ContentClass.CANARY_FINGERPRINT_HTML: "/home/{persona}/Documents/asset_directory.html",
ContentClass.CANARY_FINGERPRINT_SVG: "/home/{persona}/Documents/network_topology.svg",
}
@@ -136,10 +143,12 @@ async def cultivate(
)
callback_token = _new_callback_token()
http_base_str: str = http_base or os.environ.get("DECNET_CANARY_HTTP_BASE") or ""
dns_zone_str: str = dns_zone or os.environ.get("DECNET_CANARY_DNS_ZONE") or ""
ctx = CanaryContext(
callback_token=callback_token,
http_base=http_base or os.environ.get("DECNET_CANARY_HTTP_BASE", ""),
dns_zone=dns_zone or os.environ.get("DECNET_CANARY_DNS_ZONE", ""),
http_base=http_base_str,
dns_zone=dns_zone_str,
persona="linux", # all our deckies are POSIX in MVP
)
generator = get_generator(gen_name)
@@ -154,7 +163,7 @@ async def cultivate(
# attribute a callback if the artifact trips during the plant
# itself (improbable but possible — DOCX viewers can preview
# autoplay-style).
await repo.create_canary_token({
token_data: dict = {
"kind": _GENERATOR_TO_KIND.get(gen_name, "http"),
"decky_name": plan.decky_name,
"instrumenter": None,
@@ -165,7 +174,10 @@ async def cultivate(
"placed_at": datetime.now(timezone.utc),
"created_by": created_by,
"state": "planted",
})
}
if artifact.fingerprint_nonce is not None:
token_data["fingerprint_nonce"] = artifact.fingerprint_nonce
await repo.create_canary_token(token_data)
# Carry the placement_path on the artifact so the orchestrator's
# plant_file call uses it. We don't mutate the generator's

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Minimal authoritative DNS server for canary tokens (stdlib only).
We don't need a full resolver — only enough to:
@@ -131,7 +132,7 @@ def _build_response(
question = qname_bytes + struct.pack("!HH", query.qtype, query.qclass)
answer = b""
if an_count:
if an_count and answer_ip is not None:
# Use a name pointer back to the question (offset 12).
ptr = struct.pack("!H", 0xC000 | 12)
rdata = bytes(int(o) for o in answer_ip.split("."))
@@ -169,10 +170,10 @@ class CanaryDNSProtocol(asyncio.DatagramProtocol):
self._answer_ip = answer_ip
self._transport: Optional[asyncio.DatagramTransport] = None
def connection_made(self, transport) -> None: # type: ignore[override]
self._transport = transport # type: ignore[assignment]
def connection_made(self, transport) -> None:
self._transport = transport
def datagram_received( # type: ignore[override]
def datagram_received(
self, data: bytes, addr: Tuple[str, int],
) -> None:
try:
@@ -190,7 +191,7 @@ class CanaryDNSProtocol(asyncio.DatagramProtocol):
return
# Known name — answer with our sinkhole IP, then fire the hook.
self._send(addr, _build_response(query, answer_ip=self._answer_ip))
asyncio.create_task(self._hook(slug, query, addr[0]))
asyncio.ensure_future(self._hook(slug, query, addr[0]))
def _slug_for(self, qname: str) -> Optional[str]:
if not self._zone or not qname.endswith(self._suffix):

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Generator and instrumenter factories.
Same lazy-import pattern as :mod:`decnet.intel.factory` — concrete
@@ -21,6 +22,8 @@ KNOWN_GENERATORS: Tuple[str, ...] = (
"honeydoc_docx",
"honeydoc_pdf",
"mysql_dump",
"fingerprint_html",
"fingerprint_svg",
)
KNOWN_INSTRUMENTERS: Tuple[str, ...] = (
@@ -64,6 +67,16 @@ def get_generator(name: str) -> CanaryGenerator:
if name == "mysql_dump":
from decnet.canary.generators.mysql_dump import MySQLDumpGenerator
return MySQLDumpGenerator()
if name == "fingerprint_html":
from decnet.canary.generators.fingerprint_html import (
FingerprintHtmlGenerator,
)
return FingerprintHtmlGenerator()
if name == "fingerprint_svg":
from decnet.canary.generators.fingerprint_svg import (
FingerprintSvgGenerator,
)
return FingerprintSvgGenerator()
raise ValueError(
f"Unknown canary generator: {name!r}. Known: {KNOWN_GENERATORS}"
)

View File

@@ -0,0 +1,292 @@
// SPDX-License-Identifier: AGPL-3.0-or-later
// Canary fingerprint payload — the JS that runs inside an opened HTML/SVG
// canary, harvests browser primitives, and beacons the result back to the
// canary worker. Ported from canary-self-test.html with the rendering UI
// stripped out.
//
// Three placeholders are substituted by the Python builder BEFORE
// javascript-obfuscator runs:
//
// {{BEACON_URL}} → full URL to /c/<callback_token> (no trailing slash)
// {{MINT_UUID}} → per-mint UUID, baked into the string-array post-obf
// {{MINT_NONCE}} → 16-hex HMAC nonce; the worker rejects ?d=/?o= without it
//
// Beacon strategy (MVP): a bare GET pixel for "I was opened" reliability,
// then a fingerprint payload sent as a base64-URL query param on a second
// GET so the existing worker records the hit even before step-4 POST
// support lands. Both fail-open: any error short-circuits to next step.
(async function () {
var BEACON_URL = "{{BEACON_URL}}";
var MINT_UUID = "{{MINT_UUID}}";
var MINT_NONCE = "{{MINT_NONCE}}";
var fp = { mint: MINT_UUID };
function fire(url) {
try {
var img = new Image();
img.src = url;
} catch (e) { /* swallow */ }
}
// 1) bare-open beacon — fires regardless of whether the rest succeeds
fire(BEACON_URL + "?o=1&k=" + MINT_NONCE);
function sha256(str) {
var buf = new TextEncoder().encode(str);
return crypto.subtle.digest("SHA-256", buf).then(function (h) {
return Array.from(new Uint8Array(h))
.map(function (b) { return b.toString(16).padStart(2, "0"); })
.join("");
});
}
// navigator
try {
fp.nav = {
ua: navigator.userAgent,
pl: navigator.platform,
lg: navigator.language,
lgs: (navigator.languages || []).join(","),
ck: navigator.cookieEnabled,
dnt: navigator.doNotTrack,
hc: navigator.hardwareConcurrency,
dm: navigator.deviceMemory || null,
tp: navigator.maxTouchPoints,
wd: navigator.webdriver === true,
pdf: navigator.pdfViewerEnabled || null,
};
} catch (e) { fp.nav = { err: String(e) }; }
// screen
try {
fp.scr = {
w: screen.width, h: screen.height,
aw: screen.availWidth, ah: screen.availHeight,
cd: screen.colorDepth, pd: screen.pixelDepth,
dpr: window.devicePixelRatio,
iw: window.innerWidth, ih: window.innerHeight,
or: (screen.orientation && screen.orientation.type) || null,
};
} catch (e) { fp.scr = { err: String(e) }; }
// tz / locale
try {
var dtf = Intl.DateTimeFormat().resolvedOptions();
fp.tz = {
z: dtf.timeZone, lc: dtf.locale,
ca: dtf.calendar, ns: dtf.numberingSystem,
off: new Date().getTimezoneOffset(),
};
} catch (e) { fp.tz = { err: String(e) }; }
// connection
try {
var c = navigator.connection;
fp.cn = c ? {
t: c.effectiveType, dl: c.downlink, rtt: c.rtt, sd: c.saveData,
} : null;
} catch (e) { fp.cn = { err: String(e) }; }
// canvas
try {
var cv = document.createElement("canvas");
cv.width = 280; cv.height = 60;
var ctx = cv.getContext("2d");
ctx.textBaseline = "top";
ctx.font = "14px Arial";
ctx.fillStyle = "#f60";
ctx.fillRect(125, 1, 62, 20);
ctx.fillStyle = "#069";
ctx.fillText("c-" + String.fromCharCode(0x1f600), 2, 15);
ctx.fillStyle = "rgba(102,204,0,0.7)";
ctx.fillText("c-" + String.fromCharCode(0x1f600), 4, 17);
var dataURL = cv.toDataURL();
fp.cv = { h: await sha256(dataURL), n: dataURL.length };
} catch (e) { fp.cv = { err: String(e) }; }
// webgl
try {
var gc = document.createElement("canvas");
var gl = gc.getContext("webgl") || gc.getContext("experimental-webgl");
if (gl) {
var ext = gl.getExtension("WEBGL_debug_renderer_info");
fp.gl = {
v: gl.getParameter(gl.VENDOR),
r: gl.getParameter(gl.RENDERER),
ver: gl.getParameter(gl.VERSION),
sl: gl.getParameter(gl.SHADING_LANGUAGE_VERSION),
uv: ext ? gl.getParameter(ext.UNMASKED_VENDOR_WEBGL) : null,
ur: ext ? gl.getParameter(ext.UNMASKED_RENDERER_WEBGL) : null,
};
} else { fp.gl = { err: "unavailable" }; }
} catch (e) { fp.gl = { err: String(e) }; }
// audio
try {
var ACtx = window.OfflineAudioContext || window.webkitOfflineAudioContext;
if (ACtx) {
var actx = new ACtx(1, 44100, 44100);
var osc = actx.createOscillator();
var cmp = actx.createDynamicsCompressor();
osc.type = "triangle"; osc.frequency.value = 10000;
cmp.threshold.value = -50; cmp.knee.value = 40;
cmp.ratio.value = 12; cmp.attack.value = 0; cmp.release.value = 0.25;
osc.connect(cmp); cmp.connect(actx.destination);
osc.start(0);
var buf = await actx.startRendering();
var data = buf.getChannelData(0).slice(4500, 5000);
var sum = 0;
for (var i = 0; i < data.length; i++) sum += Math.abs(data[i]);
fp.au = { h: await sha256(sum.toString()), s: sum.toFixed(8) };
} else { fp.au = { err: "unavailable" }; }
} catch (e) { fp.au = { err: String(e) }; }
// fonts
try {
var bases = ["monospace", "sans-serif", "serif"];
var tests = [
"Arial", "Helvetica", "Times New Roman", "Courier New", "Verdana",
"Georgia", "Trebuchet MS", "Comic Sans MS", "Impact",
"Calibri", "Cambria", "Consolas", "Segoe UI", "Tahoma",
"JetBrains Mono", "Fira Code", "Cascadia Code", "SF Mono",
"Menlo", "Monaco", "Source Code Pro", "Inconsolata", "Hack",
"San Francisco", "Helvetica Neue", "Lucida Grande",
"DejaVu Sans", "DejaVu Sans Mono", "Liberation Sans",
"Liberation Mono", "Ubuntu", "Ubuntu Mono", "Roboto",
"Noto Sans", "Noto Mono",
"Microsoft YaHei", "SimSun", "PingFang SC", "Hiragino Sans",
"Hiragino Kaku Gothic Pro", "Yu Gothic", "Meiryo",
"Malgun Gothic", "Noto Sans CJK",
"Adobe Garamond Pro", "Myriad Pro", "Minion Pro",
"Bahnschrift", "Cyberpunk",
];
var sp = document.createElement("span");
sp.style.fontSize = "72px";
sp.style.position = "absolute";
sp.style.left = "-9999px";
sp.innerHTML = "mmmmmmmmmmlli";
document.body.appendChild(sp);
var bs = {};
for (var bi = 0; bi < bases.length; bi++) {
sp.style.fontFamily = bases[bi];
bs[bases[bi]] = { w: sp.offsetWidth, h: sp.offsetHeight };
}
var det = [];
for (var ti = 0; ti < tests.length; ti++) {
for (var bj = 0; bj < bases.length; bj++) {
sp.style.fontFamily = "'" + tests[ti] + "'," + bases[bj];
if (sp.offsetWidth !== bs[bases[bj]].w ||
sp.offsetHeight !== bs[bases[bj]].h) {
det.push(tests[ti]); break;
}
}
}
document.body.removeChild(sp);
fp.ft = {
h: await sha256(det.slice().sort().join(",")),
n: det.length, t: tests.length, d: det,
};
} catch (e) { fp.ft = { err: String(e) }; }
// webrtc local ip leak
try {
var ips = {}; var cands = [];
var RPC = window.RTCPeerConnection || window.webkitRTCPeerConnection ||
window.mozRTCPeerConnection;
if (RPC) {
var pc = new RPC({ iceServers: [{ urls: "stun:stun.l.google.com:19302" }] });
pc.createDataChannel("");
pc.onicecandidate = function (e) {
if (!e.candidate) return;
cands.push(e.candidate.candidate);
var m = e.candidate.candidate.match(
/(\d+\.\d+\.\d+\.\d+|[a-f0-9:]+::[a-f0-9:]+)/);
if (m) ips[m[1]] = 1;
};
var off = await pc.createOffer();
await pc.setLocalDescription(off);
await new Promise(function (r) { setTimeout(r, 1500); });
pc.close();
fp.rtc = { ip: Object.keys(ips), n: cands.length, c: cands.slice(0, 3) };
} else { fp.rtc = { err: "unavailable" }; }
} catch (e) { fp.rtc = { err: String(e) }; }
// battery
try {
if (navigator.getBattery) {
var bat = await navigator.getBattery();
fp.bt = {
c: bat.charging, l: bat.level,
ct: bat.chargingTime === Infinity ? "inf" : bat.chargingTime,
dt: bat.dischargingTime === Infinity ? "inf" : bat.dischargingTime,
};
} else { fp.bt = { err: "unavailable" }; }
} catch (e) { fp.bt = { err: String(e) }; }
// perf timing jitter
try {
var samples = [];
for (var pi = 0; pi < 1000; pi++) {
var pa = performance.now();
var x = 0;
for (var pj = 0; pj < 1000; pj++) x += Math.sqrt(pj);
samples.push(performance.now() - pa);
}
samples.sort(function (a, b) { return a - b; });
fp.pf = {
med: samples[500].toFixed(4),
p95: samples[950].toFixed(4),
mn: samples[0].toFixed(4),
mx: samples[999].toFixed(4),
};
} catch (e) { fp.pf = { err: String(e) }; }
// permissions
try {
if (navigator.permissions) {
var names = ["geolocation", "notifications", "camera", "microphone",
"persistent-storage", "clipboard-read", "clipboard-write"];
var st = {};
for (var ni = 0; ni < names.length; ni++) {
try {
var r = await navigator.permissions.query({ name: names[ni] });
st[names[ni]] = r.state;
} catch (e) { st[names[ni]] = "unsupported"; }
}
fp.pm = st;
} else { fp.pm = { err: "unavailable" }; }
} catch (e) { fp.pm = { err: String(e) }; }
// composite identity hash — stable inputs only
try {
var stable = [
fp.cv && fp.cv.h, fp.au && fp.au.h, fp.ft && fp.ft.h,
fp.gl && fp.gl.ur, fp.nav && fp.nav.pl,
fp.nav && fp.nav.hc, fp.tz && fp.tz.z,
fp.scr && (fp.scr.w + "x" + fp.scr.h),
].filter(Boolean).join("|");
fp.id = await sha256(stable);
} catch (e) { fp.id = { err: String(e) }; }
// 2) ship the payload as base64url JSON on a GET query param.
// The current worker records the hit on /c/<slug>; step-4 worker
// will decode ?d= and persist the fingerprint blob.
try {
var json = JSON.stringify(fp);
var b64 = btoa(unescape(encodeURIComponent(json)))
.replace(/\+/g, "-").replace(/\//g, "_").replace(/=+$/, "");
// chunk if URL would exceed safe limit (~6KB)
var MAX = 6000;
if (b64.length <= MAX) {
fire(BEACON_URL + "?d=" + b64 + "&k=" + MINT_NONCE);
} else {
var sid = (Math.random() * 1e9 | 0).toString(36);
var total = Math.ceil(b64.length / MAX);
for (var ci = 0; ci < total; ci++) {
var part = b64.substr(ci * MAX, MAX);
fire(BEACON_URL + "?s=" + sid + "&i=" + ci + "&n=" + total + "&d=" + part + "&k=" + MINT_NONCE);
}
}
} catch (e) { /* swallow */ }
})();

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Built-in canary generators (synthesised fake artifacts).
Concrete classes live in sibling modules and are imported lazily by

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Fake ``~/.aws/credentials`` block (passive bait).
This is the **passive** variant — no callback wiring. An attacker

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Fake ``.env`` with embedded callback URLs.
Modern web stacks read environment variables for everything from

View File

@@ -0,0 +1,141 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""HTML fingerprint canary — plausible-looking page with an obfuscated
browser-fingerprinting payload inlined at the bottom of ``<body>``.
The visible content is a deliberately mundane "internal directory"
table — the kind of file a curious attacker pulls off a decky's
filesystem and opens locally to triage. When the file is opened in
*any* network-connected browser the obfuscated payload runs and beacons
to ``/c/<callback_token>``: first a bare-open pixel, then a chunked
fingerprint dump (canvas, audio, fonts, WebGL, WebRTC local IPs,
timing jitter, permissions, composite identity hash).
Determinism: the mint UUID is derived from the callback token via
:func:`uuid.uuid5` so the same ``ctx`` always produces byte-identical
output, satisfying the generator contract in :mod:`decnet.canary.base`.
The obfuscator's seed and polymorphic config bits are likewise
callback-token-derived (see :mod:`decnet.canary.obfuscator`).
"""
from __future__ import annotations
import hashlib
import uuid
from decnet.canary.base import CanaryArtifact, CanaryContext, CanaryGenerator
from decnet.canary.obfuscator import render_fingerprint_js, nonce_for
_MINT_NAMESPACE = uuid.UUID("a3f7c821-9d1e-4b6a-8c2d-1e4f9a7b3c5d")
def _mint_uuid_for(callback_token: str) -> str:
return str(uuid.uuid5(_MINT_NAMESPACE, callback_token))
def _stable_int(callback_token: str, salt: str = "") -> int:
"""Deterministic non-negative int derived from the callback token.
``builtins.hash`` is salted per-process — useless for a generator
that must be byte-identical across runs. SHA-256 prefix is
overkill but free.
"""
h = hashlib.sha256((callback_token + "|" + salt).encode("utf-8")).digest()
return int.from_bytes(h[:4], "big")
_PAGE_TEMPLATE = """<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Internal Asset Directory</title>
<style>
body{{font-family:Segoe UI,Arial,sans-serif;background:#fafafa;color:#222;
margin:24px;font-size:13px}}
h1{{font-size:18px;margin:0 0 4px 0}}
.sub{{color:#777;font-size:11px;margin-bottom:18px}}
table{{border-collapse:collapse;width:100%;background:#fff;
box-shadow:0 1px 2px rgba(0,0,0,.05)}}
th,td{{padding:6px 10px;border-bottom:1px solid #eee;text-align:left}}
th{{background:#f4f4f4;font-weight:600;font-size:11px;
text-transform:uppercase;letter-spacing:.5px;color:#555}}
tr:hover td{{background:#fafbff}}
.foot{{margin-top:16px;color:#999;font-size:11px}}
</style>
</head>
<body>
<h1>Internal Asset Directory</h1>
<div class="sub">last sync: {sync_label} · {row_count} entries · CONFIDENTIAL</div>
<table>
<tr><th>Hostname</th><th>Owner</th><th>Role</th><th>VLAN</th><th>Notes</th></tr>
{rows}
</table>
<div class="foot">page generated by directory-sync v2.4.1 — do not redistribute</div>
<script>{payload}</script>
</body>
</html>
"""
_ROW_POOL = (
("ny-app-01.corp.local", "k.tanaka", "app server", "vlan20", "primary"),
("ny-db-01.corp.local", "ops", "postgres primary", "vlan30", "backup nightly"),
("ny-build-02.corp.local", "ci-bot", "jenkins agent", "vlan40", ""),
("sf-vpn-01.corp.local", "netsec", "wireguard endpoint", "vlan10", "external"),
("ldn-mail-03.corp.local", "j.weber", "exchange edge", "vlan50", ""),
("hk-cache-01.corp.local", "ops", "redis replica", "vlan30", "lag <1s"),
("br-dev-04.corp.local", "m.silva", "dev sandbox", "vlan60", "ephemeral"),
("eu-bastion-02.corp.local", "secops", "ssh jump host", "vlan10", "mfa required"),
("us-archive-01.corp.local", "compliance", "log archive", "vlan70", "retain 7y"),
)
def _build_rows(callback_token: str) -> tuple[str, int]:
pick = _stable_int(callback_token, "pick") % len(_ROW_POOL)
take = 5 + (_stable_int(callback_token, "take") % 4)
selected = [_ROW_POOL[(pick + i) % len(_ROW_POOL)] for i in range(take)]
cells = "\n".join(
"<tr>" + "".join(f"<td>{c}</td>" for c in row) + "</tr>"
for row in selected
)
return cells, len(selected)
def _sync_label(callback_token: str) -> str:
day = _stable_int(callback_token, "day") % 28 + 1
hour = _stable_int(callback_token, "hour") % 24
return f"2026-04-{day:02d} {hour:02d}:14 UTC"
class FingerprintHtmlGenerator(CanaryGenerator):
"""Synthesise an HTML page that fingerprints the browser opening it."""
name = "fingerprint_html"
def generate(self, ctx: CanaryContext) -> CanaryArtifact:
mint_uuid = _mint_uuid_for(ctx.callback_token)
nonce = nonce_for(ctx.callback_token, mint_uuid)
payload = render_fingerprint_js(
callback_token=ctx.callback_token,
http_base=ctx.http_base,
mint_uuid=mint_uuid,
nonce=nonce,
)
rows, row_count = _build_rows(ctx.callback_token)
body = _PAGE_TEMPLATE.format(
sync_label=_sync_label(ctx.callback_token),
row_count=row_count,
rows=rows,
payload=payload,
)
beacon = f"{ctx.http_base.rstrip('/')}/c/{ctx.callback_token}"
return CanaryArtifact(
path="",
content=body.encode("utf-8"),
mode=0o644,
mtime_offset=-86400 * 14,
generator=self.name,
fingerprint_nonce=nonce,
notes=[
f"obfuscated fingerprinter beacons={beacon}",
f"mint_uuid={mint_uuid}",
],
)

View File

@@ -0,0 +1,89 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""SVG fingerprint canary — standalone SVG with an embedded ``<script>``
that runs the obfuscated fingerprinter when the file is opened directly
in a browser.
SVG ``<script>`` only fires when the SVG is loaded as a top-level
document (or via ``<object>``/``<iframe>``); it's *blocked* when the
SVG is referenced from another page's ``<img>``. That's the right
posture for canary use: an attacker browsing the decky filesystem and
double-clicking a stray ``network_diagram.svg`` triggers it; rendering
inside a sandboxed CMS preview does not.
Same determinism guarantees as :mod:`fingerprint_html`.
"""
from __future__ import annotations
from decnet.canary.base import CanaryArtifact, CanaryContext, CanaryGenerator
from decnet.canary.generators.fingerprint_html import _mint_uuid_for, _stable_int
from decnet.canary.obfuscator import render_fingerprint_js, nonce_for
_DIAGRAM_TEMPLATE = """<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 600 360" width="600" height="360">
<style>
.box{{fill:#f7f9fb;stroke:#7a93ad;stroke-width:1.2}}
.lbl{{font:12px Segoe UI,Arial,sans-serif;fill:#2a3a4a}}
.edge{{stroke:#7a93ad;stroke-width:1.2;fill:none}}
.title{{font:bold 14px Segoe UI,Arial,sans-serif;fill:#1a2a3a}}
.cap{{font:10px Segoe UI,Arial,sans-serif;fill:#6a7a8a}}
</style>
<text class="title" x="20" y="28">Network Topology — {region} segment</text>
<text class="cap" x="20" y="44">draft v{ver} · last reviewed {review}</text>
<rect class="box" x="40" y="80" width="120" height="50" rx="4"/>
<text class="lbl" x="100" y="110" text-anchor="middle">edge gw</text>
<rect class="box" x="240" y="80" width="120" height="50" rx="4"/>
<text class="lbl" x="300" y="110" text-anchor="middle">core sw</text>
<rect class="box" x="440" y="80" width="120" height="50" rx="4"/>
<text class="lbl" x="500" y="110" text-anchor="middle">app cluster</text>
<rect class="box" x="240" y="220" width="120" height="50" rx="4"/>
<text class="lbl" x="300" y="250" text-anchor="middle">db tier</text>
<path class="edge" d="M160 105 L240 105"/>
<path class="edge" d="M360 105 L440 105"/>
<path class="edge" d="M300 130 L300 220"/>
<script type="application/ecmascript"><![CDATA[
{payload}
]]></script>
</svg>
"""
_REGIONS = ("us-east", "eu-central", "ap-south", "us-west", "sa-east")
class FingerprintSvgGenerator(CanaryGenerator):
"""Synthesise an SVG that fingerprints the browser opening it."""
name = "fingerprint_svg"
def generate(self, ctx: CanaryContext) -> CanaryArtifact:
mint_uuid = _mint_uuid_for(ctx.callback_token)
nonce = nonce_for(ctx.callback_token, mint_uuid)
payload = render_fingerprint_js(
callback_token=ctx.callback_token,
http_base=ctx.http_base,
mint_uuid=mint_uuid,
nonce=nonce,
)
region = _REGIONS[_stable_int(ctx.callback_token, "reg") % len(_REGIONS)]
ver = 1 + (_stable_int(ctx.callback_token, "ver") % 6)
day = _stable_int(ctx.callback_token, "day") % 28 + 1
body = _DIAGRAM_TEMPLATE.format(
region=region,
ver=ver,
review=f"2026-03-{day:02d}",
payload=payload,
)
beacon = f"{ctx.http_base.rstrip('/')}/c/{ctx.callback_token}"
return CanaryArtifact(
path="",
content=body.encode("utf-8"),
mode=0o644,
mtime_offset=-86400 * 30,
generator=self.name,
fingerprint_nonce=nonce,
notes=[
f"obfuscated fingerprinter beacons={beacon}",
f"mint_uuid={mint_uuid}",
],
)

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Fake ``.git/config`` with an attacker-bait remote URL.
The ``[remote "origin"]`` ``url`` field is the natural place to embed

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Built-in honeydoc — a minimal HTML "report" with a tracking pixel.
This is the *fallback* honeydoc used when the operator hasn't

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Real-DOCX honeydoc generator.
Synthesises a minimal but structurally valid DOCX from scratch via

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Real-PDF honeydoc generator (uses :mod:`pikepdf`).
Builds a one-page PDF with the same Q3-review body as the HTML/DOCX
@@ -43,7 +44,7 @@ class HoneydocPdfGenerator(CanaryGenerator):
def generate(self, ctx: CanaryContext) -> CanaryArtifact:
try:
from pikepdf import Pdf, Name, Dictionary, String # type: ignore[import-not-found]
from pikepdf import Pdf, Name, Dictionary, String
except ImportError as e:
raise InstrumenterRejectedError(
"honeydoc_pdf requires pikepdf; install it (`pip install "

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Fake ``mysqldump`` output that phones home on import.
Mirrors the Canarytokens.org MySQL-dump trick. When a victim runs

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Fake SSH private key with the callback host in the comment.
OpenSSH private keys carry a free-form comment field — typically

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Built-in canary instrumenters (operator-uploaded artifact mutation).
Lazy-imported by :func:`decnet.canary.factory.get_instrumenter`.

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""DOCX instrumenter — inject a remote image into the body.
DOCX files are zip archives carrying ``word/document.xml`` (the body)

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""HTML instrumenter — append a 1×1 tracking pixel.
Stdlib-only. We don't parse the HTML; we just inject the ``<img>``

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Image instrumenter — requires :mod:`PIL` (optional dependency).
For PNG/JPEG/GIF we append a tEXt/EXIF chunk carrying the slug so
@@ -32,7 +33,7 @@ class ImageInstrumenter(CanaryInstrumenter):
self, blob: bytes, ctx: CanaryContext, *, target_path: str,
) -> CanaryArtifact:
try:
from PIL import Image, PngImagePlugin # type: ignore[import-not-found]
from PIL import Image, PngImagePlugin
except ImportError as e:
raise InstrumenterRejectedError(
"image instrumenter requires Pillow; install it (`pip "

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Passthrough instrumenter — bytes go to disk unchanged.
Used as the dispatch fallback for content types we can't safely

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""PDF instrumenter — requires :mod:`pikepdf` (optional dependency).
PDF embedding is non-trivial: the cleanest place to put a callback
@@ -34,7 +35,7 @@ class PdfInstrumenter(CanaryInstrumenter):
self, blob: bytes, ctx: CanaryContext, *, target_path: str,
) -> CanaryArtifact:
try:
import pikepdf # type: ignore[import-not-found]
import pikepdf
except ImportError as e:
raise InstrumenterRejectedError(
"PDF instrumenter requires pikepdf; install it (`pip "

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Plain-text / config-file instrumenter.
Two embedding strategies, picked in order:

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""XLSX instrumenter — embed an external-image link.
XLSX is structurally identical to DOCX (Office Open XML zip). The

178
decnet/canary/obfuscator.py Normal file
View File

@@ -0,0 +1,178 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Per-mint JS obfuscator wrapper.
Thin Python wrapper around the ``javascript-obfuscator`` Node package.
Used by the fingerprint generators / instrumenters to produce a unique,
hard-to-statically-analyse JS blob per canary mint.
Two design choices flow from the canary contract in :mod:`base`:
* **Determinism.** Generators must return byte-identical artifacts for
the same ``(callback_token, http_base, dns_zone, persona)``. We
derive a numeric seed from the callback token and pass it to the
obfuscator's own ``seed`` option, and we derive the polymorphic
config bits from the same hash so a re-mint reproduces exactly.
* **Per-mint uniqueness.** Two different callback tokens produce
structurally different output: different identifier names, different
string-array rotation, optionally different transforms enabled.
The Node helper at ``_obfuscate_helper.js`` is invoked via subprocess.
We pass code+options as JSON on stdin and read the obfuscated result
from stdout. Stderr surfaces obfuscator failures.
"""
from __future__ import annotations
import hashlib
import hmac
import json
import os
import subprocess # nosec B404 — Node helper exec is the whole point
from pathlib import Path
from typing import Any
_HELPER = Path(__file__).parent / "_obfuscate_helper.js"
_PAYLOAD = Path(__file__).parent / "fingerprint_payload.js"
# Node binary path. Honor DECNET_NODE_BIN so deployments can pin a
# specific runtime; default to PATH lookup.
_NODE_BIN = os.environ.get("DECNET_NODE_BIN", "node")
# Hard timeout for the obfuscator subprocess. Real runs on the
# fingerprint payload sit well under 5s on a dev box.
_TIMEOUT_S = 30
class ObfuscatorError(RuntimeError):
"""Raised when the Node helper fails or returns empty output."""
class FingerprintSecretMissing(RuntimeError):
"""Raised when ``DECNET_CANARY_FINGERPRINT_SECRET`` is unset.
Fingerprint canaries embed a per-mint nonce derived from this
server-side secret; without it the worker cannot validate incoming
fingerprint beacons, so we fail loud at mint time rather than ship
a defeatable canary.
"""
_FINGERPRINT_SECRET_ENV = "DECNET_CANARY_FINGERPRINT_SECRET" # nosec B105 — this is an env var name, not a hardcoded password
def nonce_for(callback_token: str, mint_uuid: str) -> str:
"""Compute the per-mint fingerprint nonce.
HMAC-SHA256 keyed on the server-side master secret, message is
``callback_token + "|" + mint_uuid``. Truncated to 16 hex chars
(~64 bits of entropy) — enough to defeat slug-only forgery while
fitting comfortably into a query string.
"""
secret = os.environ.get(_FINGERPRINT_SECRET_ENV, "")
if not secret:
raise FingerprintSecretMissing(
f"{_FINGERPRINT_SECRET_ENV} is unset; fingerprint canaries cannot mint"
)
msg = f"{callback_token}|{mint_uuid}".encode("utf-8")
return hmac.new(secret.encode("utf-8"), msg, hashlib.sha256).hexdigest()[:16]
def _seed_from_token(callback_token: str) -> int:
"""Derive a 31-bit numeric seed from the callback token.
``javascript-obfuscator`` expects ``seed: number`` (int32-ish);
using a SHA-256-derived prefix gives us a uniform distribution
across the 31-bit positive range.
"""
h = hashlib.sha256(callback_token.encode("utf-8")).digest()
return int.from_bytes(h[:4], "big") & 0x7FFFFFFF
def _config_from_seed(seed: int) -> dict[str, Any]:
"""Build a deterministic, per-mint obfuscator config.
The hash bits drive *which* transforms apply — two mints get
structurally different outputs, not just different identifier names.
Defaults stay aggressive enough that reverse engineering is real
work; we never disable string-array or rename, only vary the dial.
"""
bits = seed
encodings = ("base64", "rc4")
string_array_encoding = [encodings[bits & 1]]
control_flow_threshold = 0.5 + ((bits >> 1) & 0xFF) / 512.0 # 0.5 .. ~1.0
dead_code_threshold = 0.2 + ((bits >> 9) & 0xFF) / 512.0 # 0.2 .. ~0.7
transform_object_keys = bool((bits >> 17) & 1)
numbers_to_expressions = bool((bits >> 18) & 1)
simplify = bool((bits >> 19) & 1)
return {
"compact": True,
"seed": seed,
"controlFlowFlattening": True,
"controlFlowFlatteningThreshold": round(control_flow_threshold, 3),
"deadCodeInjection": True,
"deadCodeInjectionThreshold": round(dead_code_threshold, 3),
"stringArray": True,
"stringArrayEncoding": string_array_encoding,
"stringArrayThreshold": 1,
"stringArrayRotate": True,
"stringArrayShuffle": True,
"splitStrings": True,
"splitStringsChunkLength": 4 + (bits & 7),
"transformObjectKeys": transform_object_keys,
"numbersToExpressions": numbers_to_expressions,
"simplify": simplify,
"selfDefending": False, # breaks SVG embed; not worth the cost
"renameGlobals": False,
"identifierNamesGenerator": "mangled-shuffled",
}
def obfuscate(code: str, *, callback_token: str) -> str:
"""Obfuscate *code* deterministically per *callback_token*.
Raises :class:`ObfuscatorError` if Node fails or returns empty.
"""
seed = _seed_from_token(callback_token)
options = _config_from_seed(seed)
payload = json.dumps({"code": code, "options": options})
try:
proc = subprocess.run( # nosec B603 — argv-form, no shell, fixed helper path; payload is JSON on stdin, not in argv
[_NODE_BIN, str(_HELPER)],
input=payload, capture_output=True, text=True,
timeout=_TIMEOUT_S, check=False,
)
except FileNotFoundError as e:
raise ObfuscatorError(f"node binary not found: {_NODE_BIN!r}") from e
except subprocess.TimeoutExpired as e:
raise ObfuscatorError("javascript-obfuscator timed out") from e
if proc.returncode != 0:
raise ObfuscatorError(
f"javascript-obfuscator failed rc={proc.returncode} "
f"stderr={proc.stderr.strip()[:400]}"
)
out = proc.stdout
if not out.strip():
raise ObfuscatorError("javascript-obfuscator returned empty output")
return out
def render_fingerprint_js(
*, callback_token: str, http_base: str, mint_uuid: str, nonce: str,
) -> str:
"""Build the obfuscated fingerprint JS for a single mint.
Substitutes ``{{BEACON_URL}}``, ``{{MINT_UUID}}``, and
``{{MINT_NONCE}}`` in the payload template, then runs it through
:func:`obfuscate` with a seed derived from the callback token.
The nonce is appended as ``&k=`` on every beacon URL the JS emits;
the worker rejects fingerprint payloads whose ``?k=`` doesn't match
the row's :attr:`CanaryToken.fingerprint_nonce`.
"""
template = _PAYLOAD.read_text(encoding="utf-8")
beacon = f"{http_base.rstrip('/')}/c/{callback_token}"
src = (
template
.replace("{{BEACON_URL}}", beacon)
.replace("{{MINT_UUID}}", mint_uuid)
.replace("{{MINT_NONCE}}", nonce)
)
return obfuscate(src, callback_token=callback_token)

View File

@@ -0,0 +1,10 @@
{
"name": "decnet-canary-obfuscator",
"version": "0.1.0",
"private": true,
"description": "Node helper for decnet.canary.obfuscator — javascript-obfuscator wrapper invoked via subprocess.",
"main": "_obfuscate_helper.js",
"dependencies": {
"javascript-obfuscator": "^5.4.2"
}
}

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Persona-aware path resolution for canary artifacts.
Linux-persona deckies use POSIX-shaped paths under ``/home/<user>``.
@@ -28,6 +29,8 @@ _LINUX_DEFAULTS: dict[str, str] = {
"honeydoc": "/home/{user}/Documents/quarterly_report.html",
"honeydoc_docx": "/home/{user}/Documents/quarterly_report.docx",
"honeydoc_pdf": "/home/{user}/Documents/quarterly_report.pdf",
"fingerprint_html": "/home/{user}/Documents/asset_directory.html",
"fingerprint_svg": "/home/{user}/Documents/network_topology.svg",
}
_WINDOWS_DEFAULTS: dict[str, str] = {
@@ -38,6 +41,8 @@ _WINDOWS_DEFAULTS: dict[str, str] = {
"honeydoc": "/home/{user}/Documents/quarterly_report.html",
"honeydoc_docx": "/home/{user}/Documents/quarterly_report.docx",
"honeydoc_pdf": "/home/{user}/Documents/quarterly_report.pdf",
"fingerprint_html": "/home/{user}/Documents/asset_directory.html",
"fingerprint_svg": "/home/{user}/Documents/network_topology.svg",
}

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Plant / revoke canary artifacts inside running decky containers.
Single entry point per operation:
@@ -20,11 +21,8 @@ shape but speaks bytes-via-base64 over the wire.
"""
from __future__ import annotations
import asyncio
import base64
import os
import shlex
import time
from datetime import datetime, timedelta, timezone
from secrets import token_urlsafe
from typing import Any, Iterable, Optional
@@ -34,13 +32,16 @@ from decnet.bus.factory import get_bus
from decnet.canary.base import CanaryArtifact, CanaryContext
from decnet.canary.factory import get_generator
from decnet.canary.paths import default_path_for
from decnet.decky_io import (
delete_file_from_container,
resolve_topology_container,
write_file_to_container,
)
from decnet.logging import get_logger
from decnet.web.db.repository import BaseRepository
log = get_logger("canary.planter")
_DOCKER = "docker"
_TIMEOUT = 8.0
# Container suffix — matches the orchestrator SSH driver's convention
# (``<decky_name>-ssh``). Canary placement always happens through the
# ssh container because every decky has one and it carries the most
@@ -52,62 +53,16 @@ def _container_for(decky_name: str) -> str:
return f"{decky_name}{_SSH_CONTAINER_SUFFIX}"
def _dirname(path: str) -> str:
idx = path.rfind("/")
if idx <= 0:
return "/"
return path[:idx]
async def _run(
argv: list[str], *, stdin_bytes: Optional[bytes] = None,
) -> tuple[int, str, str]:
try:
proc = await asyncio.create_subprocess_exec(
*argv,
stdin=asyncio.subprocess.PIPE if stdin_bytes is not None else None,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
except FileNotFoundError as exc:
return 127, "", f"argv[0] not found: {exc}"
try:
stdout, stderr = await asyncio.wait_for(
proc.communicate(input=stdin_bytes), timeout=_TIMEOUT,
)
except asyncio.TimeoutError:
try:
proc.kill()
except ProcessLookupError:
pass
return 124, "", "timeout"
return (
proc.returncode if proc.returncode is not None else -1,
stdout.decode("utf-8", "replace"),
stderr.decode("utf-8", "replace"),
)
def _build_plant_command(artifact: CanaryArtifact) -> tuple[str, bytes]:
"""Compose the ``sh -c`` script + stdin payload for one artifact.
Binary safety: we base64-encode on the host and stream the result
over stdin to ``base64 -d`` inside the container, so the bytes
never touch the argv (kernel ARG_MAX would reject anything larger
than ~128KB-2MB depending on the host). Both ``base64`` (coreutils)
and ``touch -d @<unix_ts>`` are present on every Linux base image
we ship, so there's no per-distro branching.
"""
encoded = base64.b64encode(artifact.content)
mtime = int(time.time() + artifact.mtime_offset)
mode_str = oct(artifact.mode)[2:]
parts = [
f"mkdir -p {shlex.quote(_dirname(artifact.path))}",
f"base64 -d > {shlex.quote(artifact.path)}",
f"chmod {mode_str} {shlex.quote(artifact.path)}",
f"touch -d @{mtime} {shlex.quote(artifact.path)}",
]
return " && ".join(parts), encoded
# resolve_topology_container is re-exported from decky_io for back-compat
# with callers (tests, deploy hook) that imported it from this module
# before the decky_io extraction.
__all__ = [
"plant",
"revoke",
"resolve_topology_container",
"seed_baseline",
"seed_baseline_topology",
]
async def _publish(
@@ -139,6 +94,7 @@ async def plant(
repo: Optional[BaseRepository] = None,
publish: bool = True,
bus: Optional[BaseBus] = None,
container: Optional[str] = None,
) -> tuple[bool, Optional[str]]:
"""Write *artifact* into the decky's ssh container.
@@ -157,13 +113,12 @@ async def plant(
await repo.update_canary_token_state(token_uuid, "failed", err)
return False, err
sh_cmd, stdin_payload = _build_plant_command(artifact)
# ``-i`` keeps stdin attached so base64 -d inside the container can
# consume the encoded payload streamed from the host.
argv = [_DOCKER, "exec", "-i", _container_for(decky_name), "sh", "-c", sh_cmd]
rc, _stdout, stderr = await _run(argv, stdin_bytes=stdin_payload)
success = rc == 0
error = None if success else (stderr.strip()[:256] or f"rc={rc}")
target_container = container or _container_for(decky_name)
mtime = datetime.now(timezone.utc) + timedelta(seconds=artifact.mtime_offset)
success, error = await write_file_to_container(
target_container, artifact.path, artifact.content,
mode=artifact.mode, mtime=mtime,
)
if repo is not None:
if success:
@@ -182,8 +137,8 @@ async def plant(
if not success:
log.warning(
"canary.plant failed decky=%s token=%s rc=%d stderr=%r",
decky_name, token_uuid, rc, stderr[:120],
"canary.plant failed decky=%s token=%s container=%s err=%r",
decky_name, token_uuid, target_container, error,
)
return success, error
@@ -196,6 +151,7 @@ async def revoke(
repo: Optional[BaseRepository] = None,
publish: bool = True,
bus: Optional[BaseBus] = None,
container: Optional[str] = None,
) -> tuple[bool, Optional[str]]:
"""Best-effort unlink + state transition + bus publish.
@@ -203,11 +159,10 @@ async def revoke(
the file is gone after the call (whether we deleted it or it was
already missing); only docker / container-down errors return False.
"""
sh_cmd = f"rm -f {shlex.quote(placement_path)}"
argv = [_DOCKER, "exec", _container_for(decky_name), "sh", "-c", sh_cmd]
rc, _stdout, stderr = await _run(argv)
success = rc == 0
error = None if success else (stderr.strip()[:256] or f"rc={rc}")
target_container = container or _container_for(decky_name)
success, error = await delete_file_from_container(
target_container, placement_path,
)
if repo is not None:
await repo.update_canary_token_state(token_uuid, "revoked", error if not success else None)
@@ -250,6 +205,7 @@ async def seed_baseline(
persona: str = "linux",
created_by: str = "system",
bus: Optional[BaseBus] = None,
container: Optional[str] = None,
) -> list[dict[str, Any]]:
"""Plant the configured baseline canary set on one decky.
@@ -293,9 +249,59 @@ async def seed_baseline(
await plant(
decky_name, artifact,
token_uuid=token_uuid, repo=repo, publish=True, bus=bus,
container=container,
)
out.append({
"token_uuid": token_uuid, "generator": gen_name, "kind": kind,
"callback_token": slug, "placement_path": artifact.path,
})
return out
async def seed_baseline_topology(
repo: BaseRepository,
topology_id: str,
*,
created_by: str = "system",
bus: Optional[BaseBus] = None,
) -> list[dict[str, Any]]:
"""Plant baseline canaries on every decky in a MazeNET topology.
Mirrors :func:`seed_baseline` for the topology path. Container name
resolution uses :func:`resolve_topology_container` since topology
deckies may not have an ssh service — in that case we target the
base container instead.
Best-effort: failures on any single decky are logged inside
:func:`plant`; the deploy hook treats the return value as
informational. Returns a flat list of per-token dicts (with an added
``decky_name`` key) across all deckies.
"""
from decnet.topology.persistence import hydrate
hydrated = await hydrate(repo, topology_id)
if hydrated is None:
log.warning(
"canary.seed_baseline_topology: topology %s not found", topology_id,
)
return []
out: list[dict[str, Any]] = []
for decky in hydrated["deckies"]:
cfg = decky.get("decky_config") or {}
decky_name = cfg.get("name") or decky.get("name")
if not decky_name:
continue
services = decky.get("services") or []
container = resolve_topology_container(topology_id, decky_name, services)
# MazeNET deckies don't carry an OS persona today; default to
# linux (every base image we ship is Linux).
rows = await seed_baseline(
decky_name, repo,
persona="linux", created_by=created_by, bus=bus,
container=container,
)
for r in rows:
r["decky_name"] = decky_name
out.append(r)
return out

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Filesystem store for operator-uploaded canary blobs.
Blobs live under ``/var/lib/decnet/canary/blobs/<sha256>`` (override

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""``decnet canary`` worker — HTTP + DNS callback receivers.
Two surfaces, one process:
@@ -26,9 +27,14 @@ crashes loudly rather than masking failures.
from __future__ import annotations
import asyncio
import base64
import binascii
import json
import os
import time
import uuid
from datetime import datetime, timezone
from typing import Optional
from typing import Any, Optional
from fastapi import FastAPI, Request, Response
@@ -50,6 +56,41 @@ _TRANSPARENT_GIF = bytes.fromhex(
)
# Namespace used by fingerprint generators to derive mint UUID.
# Must stay in sync with fingerprint_html._MINT_NAMESPACE.
_MINT_NAMESPACE = uuid.UUID("a3f7c821-9d1e-4b6a-8c2d-1e4f9a7b3c5d")
# In-memory per-(token_uuid, src_ip) rate limiter for fingerprint persists.
# Maps (token_uuid, src_ip) -> list of monotonic timestamps.
# Not shared across worker restarts or processes — acceptable for MVP.
_FP_RATE_WINDOW_S = 60
_FP_RATE_LIMIT = 30
_fp_rate_buckets: dict[tuple[str, str], list[float]] = {}
def _fp_rate_allowed(token_uuid: str, src_ip: str) -> bool:
key = (token_uuid, src_ip)
now = time.monotonic()
cutoff = now - _FP_RATE_WINDOW_S
bucket = _fp_rate_buckets.get(key, [])
bucket = [t for t in bucket if t > cutoff]
if len(bucket) >= _FP_RATE_LIMIT:
_fp_rate_buckets[key] = bucket
return False
bucket.append(now)
_fp_rate_buckets[key] = bucket
return True
def _is_valid_fp_shape(fp: dict) -> bool:
"""Layer B — structural sanity check on a decoded fingerprint blob."""
if not isinstance(fp.get("mint"), str) or not fp["mint"]:
return False
known_keys = {"nav", "scr", "tz", "cv", "gl", "au", "ft", "rtc"}
present = sum(1 for k in known_keys if isinstance(fp.get(k), dict))
return present >= 3
def _http_base() -> str:
return os.environ.get("DECNET_CANARY_HTTP_BASE", "http://localhost:8088").rstrip("/")
@@ -104,6 +145,11 @@ def _build_app(repo: BaseRepository, bus: BaseBus) -> FastAPI:
@app.get("/c/{slug}")
async def callback(slug: str, request: Request) -> Response:
raw_nonce = request.query_params.get("k")
fp_meta, parsed_fp = _extract_fingerprint(request.query_params)
merged_headers = dict(request.headers)
if fp_meta:
merged_headers.update(fp_meta)
await _record_hit(
repo, bus,
slug=slug,
@@ -111,7 +157,9 @@ def _build_app(repo: BaseRepository, bus: BaseBus) -> FastAPI:
user_agent=request.headers.get("user-agent"),
request_path=str(request.url.path),
dns_qname=None,
raw_headers=dict(request.headers),
raw_headers=merged_headers,
parsed_fp=parsed_fp,
raw_nonce=raw_nonce,
)
# Always 200 with a tiny image so the attacker's client sees
# a "success" — same return regardless of whether the slug is
@@ -129,6 +177,67 @@ def _build_app(repo: BaseRepository, bus: BaseBus) -> FastAPI:
return app
# Per-chunk size cap. Real fingerprints fit in one ~3KB GET; honest
# overflow is handled via chunking (s/i/n + d). Anything larger than
# this on a single request is junk, so we drop it instead of letting an
# attacker inflate a trigger row indefinitely.
_FP_CHUNK_MAX = 8 * 1024
def _extract_fingerprint(qp: Any) -> tuple[dict[str, Any], Optional[dict]]:
"""Decode fingerprint-payload query params into (meta_dict, parsed_fp).
The obfuscated browser payload may send three shapes on ``GET /c/<slug>``:
* ``?o=1`` — bare-open beacon, fired before fingerprinting starts.
* ``?d=<b64url-json>`` — single-shot fingerprint dump.
* ``?s=<sid>&i=<idx>&n=<total>&d=<b64url-chunk>`` — chunked dump.
Returns a tuple of:
- ``meta`` — flat dict with ``_fp_*`` keys to merge into raw_headers.
- ``parsed_fp`` — the decoded fingerprint dict for validation, or ``None``
when there's no ``?d=`` or decoding fails.
"""
out: dict[str, Any] = {}
parsed_fp: Optional[dict] = None
if not qp:
return out, parsed_fp
o = qp.get("o") if hasattr(qp, "get") else None
if o:
out["_fp_open"] = "1"
d = qp.get("d") if hasattr(qp, "get") else None
if not d:
return out, parsed_fp
if len(d) > _FP_CHUNK_MAX:
out["_fp_oversize"] = "1"
return out, parsed_fp
sid = qp.get("s")
idx = qp.get("i")
total = qp.get("n")
if sid and idx and total:
out["_fp_sid"] = sid
out["_fp_idx"] = idx
out["_fp_total"] = total
out["_fp_chunk"] = d
return out, parsed_fp
# Single-shot: decode and pass back as parsed_fp; validation runs in
# _record_hit after token lookup so we have the stored nonce at hand.
try:
padded = d + "=" * (-len(d) % 4)
raw = base64.urlsafe_b64decode(padded.encode("ascii"))
parsed = json.loads(raw.decode("utf-8"))
except (binascii.Error, ValueError, UnicodeDecodeError):
out["_fp_decode_error"] = "1"
return out, parsed_fp
if isinstance(parsed, dict):
parsed_fp = parsed
else:
out["_fp_decode_error"] = "1"
return out, parsed_fp
def _client_ip(request: Request) -> str:
# Honor X-Forwarded-For if the operator deployed behind a reverse
# proxy. Take the leftmost address in the chain; everything after
@@ -154,16 +263,58 @@ async def _record_hit(
request_path: Optional[str],
dns_qname: Optional[str],
raw_headers: Optional[dict],
parsed_fp: Optional[dict] = None,
raw_nonce: Optional[str] = None,
) -> None:
"""Resolve slug -> token, persist a trigger, publish on the bus.
Unknown slugs are silently swallowed: returning the same response
for known and unknown slugs is the stealth posture, and persisting
every random scan would clutter the DB.
When *parsed_fp* is present (single-shot fingerprint decode succeeded),
it is validated through four layers before being merged into raw_headers:
A) nonce match against CanaryToken.fingerprint_nonce,
B) structural shape check,
C) mint UUID consistency,
D) per-(token, IP) rate limit.
Each failure drops the structured ``_fp`` and sets a ``_fp_*_invalid`` flag.
The trigger row always lands regardless — the GET hit is itself forensic.
"""
token = await repo.get_canary_token_by_slug(slug)
if token is None:
return
final_headers: dict[str, Any] = dict(raw_headers or {})
if parsed_fp is not None:
stored_nonce: Optional[str] = token.get("fingerprint_nonce")
# Layer A — nonce
if stored_nonce is not None and raw_nonce != stored_nonce:
final_headers["_fp_invalid_nonce"] = "1"
parsed_fp = None
# Layer B — shape (only when nonce passed or no nonce enforced)
if parsed_fp is not None and not _is_valid_fp_shape(parsed_fp):
final_headers["_fp_invalid_shape"] = "1"
parsed_fp = None
# Layer C — mint UUID consistency
if parsed_fp is not None:
expected_mint = str(uuid.uuid5(_MINT_NAMESPACE, slug))
if parsed_fp.get("mint") != expected_mint:
final_headers["_fp_invalid_mint"] = "1"
parsed_fp = None
# Layer D — rate limit
if parsed_fp is not None and not _fp_rate_allowed(token["uuid"], src_ip):
final_headers["_fp_rate_limited"] = "1"
parsed_fp = None
if parsed_fp is not None:
final_headers["_fp"] = parsed_fp
trigger_id = await repo.record_canary_trigger({
"token_uuid": token["uuid"],
"occurred_at": datetime.now(timezone.utc),
@@ -171,7 +322,7 @@ async def _record_hit(
"user_agent": user_agent,
"request_path": request_path,
"dns_qname": dns_qname,
"raw_headers": raw_headers or {},
"raw_headers": final_headers,
})
try:
await bus.publish(
@@ -189,6 +340,22 @@ async def _record_hit(
except Exception as e: # noqa: BLE001 — best effort
log.warning("canary.triggered publish failed slug=%s err=%s", slug, e)
# Auto-deregister fingerprint canaries after the first valid fingerprint
# is collected. Slug goes dark; the stealth posture means the attacker
# sees the same 200 + GIF on the next hit — nothing reveals the revocation.
# Guard: only fingerprint tokens have a non-NULL fingerprint_nonce; plain
# http/dns canaries are NOT auto-revoked.
if parsed_fp is not None and token.get("fingerprint_nonce") is not None:
try:
await repo.update_canary_token_state(token["uuid"], "revoked")
await bus.publish(
topics.canary(token["uuid"], topics.CANARY_REVOKED),
{"token_id": token["uuid"], "trigger_id": trigger_id,
"reason": "fingerprint_collected"},
)
except Exception as e: # noqa: BLE001 — trigger row already landed; best effort
log.warning("canary.deregister failed token=%s err=%s", token["uuid"], e)
# ---------------------------- DNS surface --------------------------------
@@ -214,7 +381,7 @@ async def _start_dns_server(
local_addr=(_dns_bind(), _dns_port()),
)
log.info("canary.dns listening zone=%s port=%d", zone, _dns_port())
return transport # type: ignore[return-value]
return transport
# ---------------------------- entry point --------------------------------

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""
DECNET CLI — entry point for all commands.
@@ -39,6 +40,7 @@ from . import (
swarm,
swarmctl,
topology,
ttp,
updater,
web,
webhook,
@@ -59,7 +61,7 @@ for _mod in (
swarm,
deploy, lifecycle, workers, inventory,
web, profiler, orchestrator, realism, reconciler, sniffer, db,
topology, bus, geoip, init, webhook, canary,
topology, bus, geoip, init, webhook, canary, ttp,
):
_mod.register(app)

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import os

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import os

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import typer

View File

@@ -1,8 +1,14 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""``decnet canary`` — HTTP + DNS callback receiver for canary tokens.
Worker process. Mirrors the shape of :mod:`decnet.cli.webhook`: a
``@app.command(name="canary")`` Typer entry point that delegates to
:func:`decnet.canary.worker.run`.
Two entry points share this module:
* ``decnet canary`` — runs the worker process. Mirrors the shape of
:mod:`decnet.cli.webhook`. Invoked by the ``decnet-canary.service``
systemd unit so its argv must stay stable.
* ``decnet canary-install-toolchain`` — provisions the Node side of
the fingerprint-canary obfuscator. Idempotent; safe to call from
the API service unit's ``ExecStartPre``.
Not master-only — any host that hosts deckies can run its own
canary worker (the bus events stay local; the webhook worker on
@@ -11,11 +17,17 @@ in ``development/let-s-move-to-the-enumerated-pike.md``).
"""
from __future__ import annotations
import shutil
import subprocess # nosec B404 — npm exec is the whole point of the toolchain installer
from pathlib import Path
import typer
from . import utils as _utils
from .utils import console, log
_TOOLCHAIN_TIMEOUT_S = 180
def register(app: typer.Typer) -> None:
@app.command(name="canary")
@@ -40,3 +52,53 @@ def register(app: typer.Typer) -> None:
asyncio.run(run())
except KeyboardInterrupt:
console.print("\n[yellow]Canary worker stopped.[/]")
@app.command(name="canary-install-toolchain")
def canary_install_toolchain(
npm_bin: str = typer.Option(
"npm", "--npm-bin", help="Path to the npm executable. Defaults to PATH lookup.",
),
) -> None:
"""Install the Node-side toolchain used by fingerprint canaries.
Runs ``npm install --omit=dev`` under the installed ``decnet/canary/``
directory so the obfuscator's helper script can ``require()``
``javascript-obfuscator`` at mint time. Requires Node >= 18.
Idempotent: re-running on an already-installed tree is fast
(npm short-circuits when ``node_modules/`` is up-to-date).
"""
import decnet.canary as _canary_pkg
canary_dir = Path(_canary_pkg.__file__).resolve().parent
if not (canary_dir / "package.json").is_file():
console.print(
f"[red]canary package.json not found under {canary_dir}; "
"wheel may be missing the JS toolchain payload.[/]"
)
raise typer.Exit(code=2)
if shutil.which(npm_bin) is None:
console.print(
f"[red]npm executable {npm_bin!r} not found on PATH. "
"Install Node >= 18 and re-run.[/]"
)
raise typer.Exit(code=2)
console.print(
f"[cyan]installing canary toolchain[/] in {canary_dir}",
)
try:
proc = subprocess.run( # nosec B603 — argv-form, no shell, fixed cwd, npm_bin checked above
[npm_bin, "install", "--omit=dev", "--no-fund", "--no-audit"],
cwd=str(canary_dir),
capture_output=True, text=True,
timeout=_TOOLCHAIN_TIMEOUT_S, check=False,
)
except subprocess.TimeoutExpired:
console.print("[red]npm install timed out after 3 minutes[/]")
raise typer.Exit(code=3) from None
if proc.returncode != 0:
console.print(
f"[red]npm install failed rc={proc.returncode}[/]\n"
f"{proc.stderr.strip()}"
)
raise typer.Exit(code=proc.returncode)
console.print("[green]canary toolchain ready[/]")

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
from typing import Optional

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
from typing import Optional

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import asyncio

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Role-based CLI gating.
MAINTAINERS: when you add a new Typer command (or add_typer group) that is
@@ -30,6 +31,10 @@ MASTER_ONLY_COMMANDS: frozenset[str] = frozenset({
"mutate", "listener", "profiler",
"services", "distros", "correlate", "archetypes", "web",
"db-reset", "init", "webhook", "clusterer", "campaign-clusterer",
# `ttp` runs on agents — local SMTP decoys persist .eml files into the
# agent's artifacts tree and the EmailLifter disk-reaches them in-process
# (DEBT-047). `ttp-backfill` stays master-only: it walks the master DB.
"ttp-backfill",
})
MASTER_ONLY_GROUPS: frozenset[str] = frozenset(
{"swarm", "topology", "geoip", "realism"}
@@ -65,7 +70,7 @@ def _gate_commands_by_mode(_app: typer.Typer) -> None:
return
_app.registered_commands = [
c for c in _app.registered_commands
if (c.name or c.callback.__name__) not in MASTER_ONLY_COMMANDS
if (c.name or (c.callback.__name__ if c.callback else "")) not in MASTER_ONLY_COMMANDS
]
_app.registered_groups = [
g for g in _app.registered_groups

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""GeoIP CLI — refresh and lookup subcommands (master-only).
Usage::

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""
`decnet init` — one-shot master-host bootstrap.
@@ -44,6 +45,12 @@ _CONFIG_PLACEHOLDER = """\
# EnvironmentFile= — never in a group-readable INI.
[decnet]
# DECNET-service user/group as configured at `decnet init` time.
# Resolved to a uid/gid on each host at deploy time via pwd.getpwnam,
# so the same user name can have different numeric uids on master vs
# agents without breaking artifact ownership.
api-user = {api_user}
api-group = {api_group}
# mode = master # or "agent"
# [api]
@@ -74,6 +81,7 @@ _CONFIG_PLACEHOLDER = """\
# master-host = 10.0.0.1
# syslog-port = 6514
# swarmctl-port = 8770
# swarmctl-host = 127.0.0.1
# [logging]
# system-log = /var/log/decnet/decnet.system.log
@@ -197,14 +205,17 @@ def _ensure_dir(
return f"skip: {path} already present" if existed else "ok"
def _ensure_config(path: Path, group: str, *, dry_run: bool) -> str:
def _ensure_config(
path: Path, group: str, *, user: str, dry_run: bool,
) -> str:
if path.exists():
return f"skip: {path} already present"
if dry_run:
console.print(f" [dim]would write:[/] {path}")
return "ok"
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(_CONFIG_PLACEHOLDER)
rendered = _CONFIG_PLACEHOLDER.format(api_user=user, api_group=group)
path.write_text(rendered)
try:
os.chmod(path, 0o640)
gid = grp.getgrnam(group).gr_gid
@@ -601,7 +612,7 @@ def register(app: typer.Typer) -> None:
# (Path("/"). / "/opt/decnet" == Path("/opt/decnet"), dropping pfx).
_install_rel = install_dir.lstrip("/")
required_tools = ("systemctl",) if deinit else (
required_tools: tuple[str, ...] = ("systemctl",) if deinit else (
"systemctl", "useradd", "groupadd", "systemd-tmpfiles",
)
if deinit:
@@ -658,7 +669,7 @@ def register(app: typer.Typer) -> None:
)
_step(
"systemctl daemon-reload",
lambda: (_run(["systemctl", "daemon-reload"], dry_run=dry_run), "ok")[1],
lambda: (_run(["systemctl", "daemon-reload"], dry_run=dry_run), "ok")[1], # type: ignore[func-returns-value]
)
_step(
f"remove {etc_decnet / 'decnet.ini'}",
@@ -754,6 +765,13 @@ def register(app: typer.Typer) -> None:
(pfx / _install_rel, 0o755, user, group),
(pfx / "var/lib/decnet", 0o750, user, group),
(pfx / "var/lib/decnet/geoip", 0o755, user, group),
# DEBT-035 / DEBT-047: artifact root carries setgid (the
# 0o2... bit) so every file written under it inherits the
# decnet group regardless of which container's uid created
# it. Group-write (0o2775) lets the API process and the
# local TTP worker read each other's outputs without a
# manual chown after every fresh deploy.
(pfx / "var/lib/decnet/artifacts", 0o2775, user, group),
(pfx / "var/log/decnet", 0o750, user, group),
(etc_decnet, 0o755, "root", group),
(pfx / "run/decnet", 0o755, "root", group),
@@ -775,12 +793,15 @@ def register(app: typer.Typer) -> None:
for path, mode, d_owner, d_group in dirs:
_step(
f"ensure dir {path}",
lambda p=path, m=mode, o=d_owner, g=d_group:
lambda p=path, m=mode, o=d_owner, g=d_group: # type: ignore[misc]
_ensure_dir(p, mode=m, owner=o, group=g, dry_run=dry_run),
)
_step(
f"write {etc_decnet / 'decnet.ini'}",
lambda: _ensure_config(etc_decnet / "decnet.ini", group, dry_run=dry_run),
lambda: _ensure_config(
etc_decnet / "decnet.ini", group,
user=user, dry_run=dry_run,
),
)
_step(
"install systemd units",
@@ -812,7 +833,7 @@ def register(app: typer.Typer) -> None:
)
_step(
"systemctl daemon-reload",
lambda: (_run(["systemctl", "daemon-reload"], dry_run=dry_run), "ok")[1],
lambda: (_run(["systemctl", "daemon-reload"], dry_run=dry_run), "ok")[1], # type: ignore[func-returns-value]
)
if no_start:
@@ -823,7 +844,7 @@ def register(app: typer.Typer) -> None:
_step(
"systemctl enable --now decnet.target",
lambda: (
_run(
_run( # type: ignore[func-returns-value]
["systemctl", "enable", "--now", "decnet.target"],
dry_run=dry_run,
),

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import typer

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import subprocess # nosec B404

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import asyncio

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
from typing import Optional

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import typer

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""``decnet realism ...`` — content-engine maintenance commands.
After stage 5 of the realism migration, this is the only remaining

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import typer

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import typer

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""`decnet swarm ...` — master-side operator commands (HTTP to local swarmctl)."""
from __future__ import annotations

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import os
@@ -16,8 +17,16 @@ from .utils import console, log
def register(app: typer.Typer) -> None:
@app.command()
def swarmctl(
port: int = typer.Option(8770, "--port", help="Port for the swarm controller"),
host: str = typer.Option("127.0.0.1", "--host", help="Bind address for the swarm controller"),
port: int = typer.Option(
8770, "--port",
envvar="DECNET_SWARMCTL_PORT",
help="Port for the swarm controller. Defaults to [swarm] swarmctl-port from /etc/decnet/decnet.ini, else 8770.",
),
host: str = typer.Option(
"127.0.0.1", "--host",
envvar="DECNET_SWARMCTL_HOST",
help="Bind address for the swarm controller. Defaults to [swarm] swarmctl-host from /etc/decnet/decnet.ini, else 127.0.0.1.",
),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
no_listener: bool = typer.Option(False, "--no-listener", help="Do not auto-spawn the syslog-TLS listener alongside swarmctl"),
tls: bool = typer.Option(False, "--tls", help="Serve over HTTPS with mTLS (required for cross-host worker heartbeats)"),

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""MazeNET topology CLI: generate / deploy / teardown / list / show."""
from __future__ import annotations
@@ -233,8 +234,8 @@ def _delete(
topo = await repo.get_topology(topology_id)
if topo is None:
return False, "not-found"
if topo["status"] in _RUNNING:
return False, str(topo["status"])
if topo.status in _RUNNING:
return False, str(topo.status)
ok = await repo.delete_topology_cascade(topology_id)
return ok, None

310
decnet/cli/ttp.py Normal file
View File

@@ -0,0 +1,310 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""``decnet ttp`` — TTP-tagging worker and admin commands.
Two flat commands share this module:
* ``decnet ttp`` — runs the long-running tagger worker. Bus-woken on
``attacker.session.ended`` / ``attacker.observed`` /
``attacker.intel.enriched`` / ``identity.{formed,merged}`` /
``credential.reuse.detected`` / ``email.received`` / ``canary.>``;
dispatches each event through :class:`CompositeTagger` (RuleEngine +
Behavioral / Intel / CanaryFingerprint / Email / Identity / Credential
lifters), persists ``ttp_tag`` rows via the idempotent
``INSERT OR IGNORE`` write, and publishes ``ttp.tagged`` +
``ttp.rule.fired.<technique_id>`` only when the insert returned a
non-zero rowcount (loop-prevention invariant from TTP_TAGGING.md
§"Bus topics"). Invoked by the ``decnet-ttp.service`` systemd unit
so its argv must stay stable.
* ``decnet ttp-backfill`` — replays historical events (shell commands
recorded on :class:`Attacker.commands`, :class:`CanaryTrigger` rows)
through the live tagger. Writes ``ttp_tag`` rows using the same
idempotent insert path. **Does not publish** to the bus — replay must
not re-trigger SIEM/webhook fan-out on already-attributed events.
Both are master-only — gated via ``MASTER_ONLY_COMMANDS`` in
:mod:`decnet.cli.gating`.
"""
from __future__ import annotations
import asyncio
import time
from datetime import datetime, timedelta, timezone
from typing import Any
import typer
from decnet.ttp.factory import CompositeTagger, get_tagger
from . import utils as _utils
from .utils import console, log
_BACKFILL_SOURCES = ("command", "canary", "all")
def register(app: typer.Typer) -> None:
@app.command(name="ttp")
def ttp(
poll_interval_secs: float = typer.Option(
60.0, "--poll-interval", "-i",
help="Slow-tick fallback when the bus is idle or unavailable (seconds)",
),
daemon: bool = typer.Option(
False, "--daemon", "-d",
help="Detach to background as a daemon process",
),
) -> None:
"""TTP-tagging worker — MITRE ATT&CK technique tagging."""
from decnet.ttp.worker import run_ttp_worker_loop
from decnet.web.dependencies import repo
if daemon:
log.info("ttp daemonizing poll=%s", poll_interval_secs)
_utils._daemonize()
log.info("ttp command invoked poll=%s", poll_interval_secs)
console.print(
f"[bold cyan]TTP tagging worker starting[/] "
f"poll={poll_interval_secs}s"
)
console.print("[dim]Press Ctrl+C to stop[/]")
async def _run() -> None:
await repo.initialize()
await run_ttp_worker_loop(
repo, poll_interval_secs=poll_interval_secs,
)
try:
asyncio.run(_run())
except KeyboardInterrupt:
console.print("\n[yellow]TTP tagging worker stopped.[/]")
@app.command(name="ttp-backfill")
def ttp_backfill(
since_days: int = typer.Option(
7, "--since-days", "-s",
min=1, max=3650,
help="Replay events whose source row is newer than N days ago.",
),
source: str = typer.Option(
"all", "--source",
help=f"Source slice to replay. One of: {', '.join(_BACKFILL_SOURCES)}.",
),
dry_run: bool = typer.Option(
False, "--dry-run",
help="Run the tagger but skip insert_tags. Reports counts only.",
),
batch_size: int = typer.Option(
500, "--batch-size",
min=1, max=100_000,
help="Number of tags accumulated before each repo.insert_tags call.",
),
) -> None:
"""Replay historical attacker activity through the live tagger.
Walks ``Attacker.commands`` (per-IP shell-command history) and
``CanaryTrigger`` (canary callback log) since N days ago,
builds the same :class:`TaggerEvent` shape the live worker
emits, and persists tags via the idempotent INSERT OR IGNORE
write. Re-running is safe — a second pass over identical
source rows reports ``inserted=0``.
Bus publish is intentionally suppressed; SIEM / webhook fan-out
sees only live events, never replays.
"""
from decnet.cli.gating import _require_master_mode
from decnet.web.dependencies import repo
_require_master_mode("ttp-backfill")
if source not in _BACKFILL_SOURCES:
console.print(
f"[red]invalid --source {source!r}; expected one of "
f"{_BACKFILL_SOURCES}[/]"
)
raise typer.Exit(code=2)
cutoff = datetime.now(tz=timezone.utc) - timedelta(days=since_days)
console.print(
f"[bold cyan]TTP backfill[/] since={cutoff.isoformat()} "
f"source={source} dry_run={dry_run} batch_size={batch_size}"
)
async def _run() -> None:
await repo.initialize()
await _backfill(
repo,
cutoff=cutoff,
sources=_resolve_sources(source),
dry_run=dry_run,
batch_size=batch_size,
)
try:
asyncio.run(_run())
except KeyboardInterrupt:
console.print("\n[yellow]Backfill interrupted.[/]")
def _resolve_sources(name: str) -> tuple[str, ...]:
if name == "all":
return ("command", "canary")
return (name,)
async def _backfill(
repo: Any,
*,
cutoff: datetime,
sources: tuple[str, ...],
dry_run: bool,
batch_size: int,
) -> None:
"""Drive the per-source backfill loops and report structured counts.
One :class:`CompositeTagger` is built once and reused for every
source — the per-lifter watch fan-out the live worker performs is
inlined here as a `watch_store()` startup task per
:class:`WatchableTagger`, so the dispatch indexes hydrate before
we start feeding events.
"""
# Import-time bound so tests can monkeypatch ``decnet.cli.ttp.get_tagger``
# to inject a recording fake without touching the global factory.
tagger = get_tagger()
watch_tasks: list[asyncio.Task[None]] = []
if isinstance(tagger, CompositeTagger):
for watchable in tagger.iter_watchables():
watch_tasks.append(asyncio.create_task(watchable.watch_store()))
# Yield once so each watch_store gets a chance to run its
# initial `load_compiled` before we feed the first event.
await asyncio.sleep(0.05)
try:
if "command" in sources:
await _backfill_commands(
repo, tagger, cutoff=cutoff,
dry_run=dry_run, batch_size=batch_size,
)
if "canary" in sources:
await _backfill_canaries(
repo, tagger, cutoff=cutoff,
dry_run=dry_run, batch_size=batch_size,
)
finally:
for task in watch_tasks:
task.cancel()
for task in watch_tasks:
try:
await task
except (asyncio.CancelledError, Exception): # noqa: BLE001
pass
async def _backfill_commands(
repo: Any,
tagger: Any,
*,
cutoff: datetime,
dry_run: bool,
batch_size: int,
) -> None:
from decnet.ttp.base import TaggerEvent
started = time.monotonic()
rows_seen = 0
cmds_seen = 0
inserted = 0
pending: list[Any] = []
async for attacker, commands in repo.iter_attacker_commands_since(cutoff):
rows_seen += 1
for idx, cmd in enumerate(commands):
cmds_seen += 1
text = cmd.get("command_text") or cmd.get("text")
if not isinstance(text, str):
continue
cmd_id = (
cmd.get("id")
or cmd.get("uuid")
or cmd.get("command_id")
or f"{attacker.uuid}#cmd{idx}"
)
event = TaggerEvent(
source_kind="command",
source_id=str(cmd_id),
attacker_uuid=attacker.uuid,
identity_uuid=getattr(attacker, "identity_id", None),
session_id=cmd.get("session_id"),
decky_id=cmd.get("decky_id") or cmd.get("decky"),
payload={**cmd, "command_text": text},
)
tags = await tagger.tag(event)
if tags:
pending.extend(tags)
if len(pending) >= batch_size:
inserted += await _flush(repo, pending, dry_run)
pending = []
if pending:
inserted += await _flush(repo, pending, dry_run)
elapsed = time.monotonic() - started
console.print(
f"source=command rows={rows_seen} commands={cmds_seen} "
f"inserted={inserted} dry_run={dry_run} elapsed_s={elapsed:.2f}"
)
async def _backfill_canaries(
repo: Any,
tagger: Any,
*,
cutoff: datetime,
dry_run: bool,
batch_size: int,
) -> None:
from decnet.ttp.base import TaggerEvent
started = time.monotonic()
rows_seen = 0
inserted = 0
pending: list[Any] = []
async for trigger in repo.iter_canary_triggers_since(cutoff):
rows_seen += 1
event = TaggerEvent(
source_kind="canary_fingerprint",
source_id=trigger.uuid,
attacker_uuid=trigger.attacker_id,
identity_uuid=None,
session_id=None,
decky_id=None,
payload={
"token_uuid": trigger.token_uuid,
"src_ip": trigger.src_ip,
"ua_signature": trigger.user_agent or "",
"user_agent": trigger.user_agent,
"request_path": trigger.request_path,
"dns_qname": trigger.dns_qname,
"headers": trigger.headers(),
},
)
tags = await tagger.tag(event)
if tags:
pending.extend(tags)
if len(pending) >= batch_size:
inserted += await _flush(repo, pending, dry_run)
pending = []
if pending:
inserted += await _flush(repo, pending, dry_run)
elapsed = time.monotonic() - started
console.print(
f"source=canary rows={rows_seen} inserted={inserted} "
f"dry_run={dry_run} elapsed_s={elapsed:.2f}"
)
async def _flush(repo: Any, tags: list[Any], dry_run: bool) -> int:
if dry_run:
return 0
return int(await repo.insert_tags(tags))

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import pathlib as _pathlib

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Shared CLI helpers: console, logger, process management, swarm HTTP client.
Submodules reference these as ``from . import utils`` then ``utils.foo(...)``
@@ -11,7 +12,7 @@ import signal
import subprocess # nosec B404
import sys
from pathlib import Path
from typing import Optional
from typing import Any, Callable, Optional
import typer
from rich.console import Console
@@ -96,7 +97,7 @@ def _is_running(match_fn) -> int | None:
return None
def _service_registry(log_file: str) -> list[tuple[str, callable, list[str]]]:
def _service_registry(log_file: str) -> list[tuple[str, Callable[..., Any], list[str]]]:
"""Return the microservice registry for health-check and relaunch.
On agents these run as systemd units invoking /usr/local/bin/decnet,
@@ -195,7 +196,7 @@ _DEFAULT_SWARMCTL_URL = "http://127.0.0.1:8770"
def _swarmctl_base_url(url: Optional[str]) -> str:
return url or os.environ.get("DECNET_SWARMCTL_URL", _DEFAULT_SWARMCTL_URL)
return url or os.environ.get("DECNET_SWARMCTL_URL") or _DEFAULT_SWARMCTL_URL
def _http_request(method: str, url: str, *, json_body: Optional[dict] = None, timeout: float = 30.0):

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import typer

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
import typer

View File

@@ -1,3 +1,4 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
from __future__ import annotations
from typing import Optional
@@ -192,6 +193,70 @@ def register(app: typer.Typer) -> None:
except KeyboardInterrupt:
console.print("\n[yellow]Reuse correlator stopped.[/]")
@app.command(name="attribution")
def attribution(
multi_actor_tick_secs: float = typer.Option(
60.0, "--multi-actor-tick", "-t",
help=(
"Cross-primitive multi_actor correlator tick interval (seconds). "
"Walks attribution_state for identities flagged on >= 2 "
"primitives and emits attribution.profile.multi_actor_suspected."
),
),
daemon: bool = typer.Option(
False, "--daemon", "-d",
help="Detach to background as a daemon process",
),
) -> None:
"""Attribution engine v0 — per-(identity, primitive) state machine.
Subscribes to ``attacker.observation.>`` and, for each event,
ensures a stub identity row, runs the merger over the full
per-(identity, primitive) observation series, upserts the
derived state, and publishes
``attribution.profile.state_changed`` only on transition.
Periodic tick fires
``attribution.profile.multi_actor_suspected`` when >= 2
primitives flag the same identity.
Closes DEBT-051. Bright-line scope: behavioural coherence and
drift only — never persona attribution to natural persons.
"""
import asyncio
from decnet.correlation.attribution_worker import (
run_attribution_loop,
)
from decnet.web.dependencies import repo
if daemon:
log.info(
"attribution worker daemonizing tick=%s",
multi_actor_tick_secs,
)
_utils._daemonize()
log.info(
"attribution worker command invoked tick=%s",
multi_actor_tick_secs,
)
console.print(
f"[bold cyan]Attribution engine starting[/] "
f"multi_actor_tick={multi_actor_tick_secs}s"
)
console.print("[dim]Press Ctrl+C to stop[/]")
async def _run() -> None:
await repo.initialize()
await run_attribution_loop(
repo,
multi_actor_tick_secs=multi_actor_tick_secs,
)
try:
asyncio.run(_run())
except KeyboardInterrupt:
console.print("\n[yellow]Attribution engine stopped.[/]")
@app.command(name="clusterer")
def clusterer(
poll_interval_secs: float = typer.Option(
@@ -295,3 +360,10 @@ def register(app: typer.Typer) -> None:
asyncio.run(_run())
except KeyboardInterrupt:
console.print("\n[yellow]Campaign clusterer stopped.[/]")
# ``decnet ttp`` and ``decnet ttp-backfill`` moved to
# :mod:`decnet.cli.ttp` — the TTP CLI surface (worker + admin verbs)
# is colocated there, mirroring the per-feature CLI split used by
# :mod:`decnet.cli.canary`, :mod:`decnet.cli.webhook`, etc. The
# ``decnet-ttp.service`` systemd unit's ExecStart still resolves to
# ``decnet ttp`` because the command name is unchanged.

Some files were not shown because too many files have changed in this diff Show More