Replaces LICENSE (GPLv3 -> AGPLv3) and prepends
`SPDX-License-Identifier: AGPL-3.0-or-later` to every source file
across decnet/, decnet_web/, tests/, scripts/, and tools/.
Rationale: closes the GPLv3 ASP loophole so any party operating a
modified DECNET as a network service must offer their modified
source. Personal copyright (Samuel Paschuan) + inbound=outbound
contributions make a future unilateral relicense infeasible.
- LICENSE: full AGPL-3.0 text (gnu.org/licenses/agpl-3.0.txt)
- COPYRIGHT: project copyright notice
- tools/add_spdx_headers.py: idempotent header injector
(shebang- and PEP 263-aware)
Touches 1565 source files (.py, .ts, .tsx, .js, .jsx, .css, .sh).
No behavior change; comments only.
Four-part fix for the collection bottleneck that was blocking the dev loop:
1. Lazy mitreattack.stix20 import in attack_stix.py — deferred to first
_load() call (TYPE_CHECKING guard at top level)
2. Lazy misp_stix_converter import in both MISP export routers — moved
from module level into the route handler body
3. Lazy attack_catalog / attack_stix in ttp.py repo mixin — thin wrapper
functions so the import chain never fires at module load time
4. tests/api/conftest.py — `from decnet.web.api import app` moved inside
the `client()` fixture; `pytest_ignore_collect` broadened to skip all
test_schemathesis*.py variants (not just test_schemathesis.py), which
were launching a subprocess server at module-import time
5. pyproject.toml — `norecursedirs` for tests/live, tests/stress,
tests/service_testing, tests/docker, tests/perf so these directories
are never entered; `-m` filter removed from addopts (now redundant);
`--dist loadscope` → `--dist load` to unblock workers immediately
6. behave_core / behave_shell rename — BEHAVE packages dropped the
`decnet_` prefix; reinstalled editable installs and updated all 14
import sites across profiler, ttp, bus, and correlation modules
Add tick_multi_actor() — periodic walk of attribution_state firing
attribution.profile.multi_actor_suspected when an identity carries
>= MULTI_ACTOR_MIN_PRIMITIVES rows in multi_actor state.
* Repo's list_multi_actor_identities() already filters to >= 2
primitives; the correlator just dispatches.
* In-memory dedup keyed on identity_uuid -> frozenset(primitives):
same set as last fire -> no re-emit. Set grows -> re-emit.
Set shrinks below threshold -> evict so a future re-flap re-fires.
Restart-resets are honest because attribution_state persists; a
v1 multi_actor_suspect_log table can replace this if needed.
* run_attribution_loop() now supervises three concurrent tasks:
observation handler, multi_actor tick loop, health/control. Tick
interval comes from _thresholds.MULTI_ACTOR_TICK_SECS (60s) with
test override.
Tests: 6 scenarios — single-primitive doesn't fire, two-primitive
co-flag fires, dedup blocks unchanged set, set growth re-fires,
threshold drop re-arms, multiple identities fire independently.
attribution_worker.handle_observation_event now executes the full
end-to-end path:
* ensure stub identity (Phase 1)
* observations_for_identity_primitive() — new repo helper joining
observations through attackers.identity_id, so v1's clusterer
gets cross-attacker rollup for free
* aggregate_observations() with ValueKind dispatched off the BEHAVE
PRIMITIVE_REGISTRY; unknown primitives default to categorical
* upsert_attribution_state() — last_change_ts locked when state is
unchanged so the dashboard can render "stable since X"
* publish attribution.profile.state_changed only on transition;
idempotent re-runs over the same observation set fire nothing
(loop-prevention invariant matching ttp.tagged)
Tests:
* 5 end-to-end attribution scenarios over in-memory SQLite + FakeBus.
* test_base_repo's DummyRepo + coverage body now stub every abstract
surface BaseRepository declares — the 6 added by this branch plus
the 12 left un-stubbed by earlier work (BEHAVE Phase 1, TTP
rollups, iter helpers). The coverage test could not previously
even instantiate.
* test_aggregate_categorical's dispatcher rejection updated for the
Phase 3 + 4 contract — ValueError on unknown kinds, not
NotImplementedError.
aggregate_numeric(): EWMA + dispersion (CV) over numeric primitive
values. Stable when CV < 20% AND mean shift < 30%; drifting on >= 30%
mean shift; conflicted on CV > 100%. Confidence is 1 - min(CV, 1).
multi_actor is intentionally NOT a numeric state — bimodal
distributions belong to the categorical detector once the value space
is bucketed.
aggregate_hash(): counts distinct hash values within
HASH_DRIFT_WINDOW_SECS of the most recent observation. 0 rotations =
stable, 1..HASH_DRIFT_MAX = drifting, > HASH_DRIFT_MAX = conflicted.
Reads rotation events; never recomputes hashes (DEBT-032 already
produces them via decnet.correlation.fingerprint_rotation).
aggregate_observations() dispatcher now routes "categorical" |
"numeric" | "hash" | None and rejects unknown kinds with ValueError
(louder than NotImplementedError now that all three v0 mergers
exist). 17 synthetic-input tests cover both new mergers and the
dispatcher.
aggregate_categorical(): pure function over a per-(identity, primitive)
observation list. Five-state vocabulary, last-N=5 window comparison
with one-outlier-tolerant majority threshold:
* unknown — < 3 observations
* stable — recent 5 agree (≥ 4 of 5 share top value), older 5 same
* drifting — recent 5 stable but disagrees with older 5, or older
was conflicted and recent stabilised
* conflicted — recent 5 split, no two-value alternation pattern
* multi_actor — recent 5 split + alternation between exactly two
values (operator A↔B handoff). Confidence capped at 0.6 per
_thresholds.MULTI_ACTOR_MAX_CONFIDENCE; flapping primitives on
flaky networks would otherwise look like two operators.
aggregate_observations() dispatcher honours value_kind="categorical"
(or None) and raises NotImplementedError for "numeric" / "hash" so
Phase 3 lands cleanly. 14 synthetic-input tests cover every state
+ boundary condition.
v0 Phase 1 of ATTRIBUTION-ENGINE.md:
* AttributionStateRow SQLModel keyed on (identity_uuid, primitive)
per ANTI direction — re-keying state rows when the v1 clusterer
merges attackers is the migration debt v0 should not bake in.
ATTRIBUTION-ENGINE.md updated with the deviation note.
* AttributionMixin: ensure_stub_identity_for_attacker, idempotent
upsert_attribution_state, get_attribution_state[_for_identity],
list_multi_actor_identities (the Phase 5 correlator's read).
* attribution.profile.{state_changed,multi_actor_suspected} bus
topics + builder; wiki Service-Bus.md updated separately.
* attribution_worker.py: subscribes to attacker.observation.>,
ensures stub identity per event, logs and continues. No merger,
no state writes, no derived events — Phase 4 wires those.
* attribution/{aggregate.py,_thresholds.py} skeletons: Phase 2
fills _aggregate_categorical, Phase 3 adds numeric+hash+dispatcher.
When the prober observes a NEW hash for an
(attacker_uuid, port, probe_type) triple it has seen before — VPS
rotation, SSH server rebuild, TLS cert swap — emit a derived
attacker.fingerprint_rotated event carrying both old and new hash.
Detection is a small library (decnet.correlation.fingerprint_rotation)
called inline from the prober at each of the three emit sites
(JARM/HASSH/TCPFP). No new daemon. New AttackerFingerprintState table
holds per-triple last-hash state; Attacker.rotation_count and
Attacker.last_rotation_at are stamped on every diff. Library is sync,
fully unit-tested via injected publish_fn / syslog_fn callbacks.
Honeypot SSH containers run `PROMPT_COMMAND` that calls
`logger --rfc5424 --msgid command -t bash "CMD …"`. The Docker-stdout
reader prepends an outer RFC5424 envelope (HOSTNAME=<decky>,
APP-NAME=1, MSGID=NIL) around that inner syslog line. Both the
collector parser (`parse_rfc5424`) and the correlation parser
(`parse_line`) saw the outer NIL MSGID and emitted `event_type="-"`
for every shell command — which:
- kept `Attacker.commands` rows missing `command_text`
- left R0001–R0030 (the pattern rule pack that matches shell
commands) with no haystack
- made `decnet.collector.log` show `event written … type=-`
for the very lines that should be `type=command`
Both parsers now detect the inner-RFC5424 shape (`<TS> <HOST> <APP>
<PROCID> <MSGID> <rest>`) when the outer MSGID is NIL and the SD-arm
is also NIL, and re-extract HOSTNAME / APP-NAME / MSGID / remainder
from the body. The collector parser also recovers the post-SD msg
tail when the SD block isn't `relay@55555` (the bash CMD line carries
a `[timeQuality …]` block) so the kv-fallback can find `src_ip`.
Mirroring tests in tests/collector and tests/correlation pin both
the unwrap and the regression guard for non-double-wrapped lines.
fix(generator): correct service pool count in _SVC_MIN/_SVC_MAX comment
BLE001 is not in ruff.toml select (F/ANN/RUF/E/W only); the suppressions
were whispering apologies to a linter that wasn't listening. Generator
comment now cites the actual ~28-entry non-singleton service pool.
Adds probe_forwarded to meaningful event kinds and stores it in the
bounty table as bounty_type=probe_relay with forwarded=true/false, so
the dashboard shows whether the upstream actually accepted the test email.
The Dockerfile PROMPT_COMMAND logger uses --msgid command, so the MSGID
field arrives as 'command' not '-'. The CMD rewrite block was guarded by
event_type == '-' so it never fired, leaving fields['command'] unpopulated
and cmd_text=None for every SSH session command.
Broaden the guard to also match event_type == 'command' with no existing
'command' field, which covers both the intended (MSGID=NIL) and actual
(MSGID=command) wire formats.
SSH/telnet decky containers emit shell commands via `logger -t bash "CMD …"`
which produces RFC 5424 lines with MSGID=NIL. Both parsers were leaving
event_type="-", so the behavioral profiler's `_COMMAND_EVENT_TYPES` filter
silently dropped them — the IP profile existed but no command transcripts
or artifacts. Confirmed in the wild: 44/48 events from one attacker were
event_type="-".
Rewrite event_type to "command" in both parsers when MSGID=NIL and the
msg starts with "CMD ". Correlation parser also extracts the cmd= payload
into fields["command"] so the profiler can build the transcript; collector
parser leaves fields={} to avoid duplicate pills in the dashboard.