Replaces LICENSE (GPLv3 -> AGPLv3) and prepends
`SPDX-License-Identifier: AGPL-3.0-or-later` to every source file
across decnet/, decnet_web/, tests/, scripts/, and tools/.
Rationale: closes the GPLv3 ASP loophole so any party operating a
modified DECNET as a network service must offer their modified
source. Personal copyright (Samuel Paschuan) + inbound=outbound
contributions make a future unilateral relicense infeasible.
- LICENSE: full AGPL-3.0 text (gnu.org/licenses/agpl-3.0.txt)
- COPYRIGHT: project copyright notice
- tools/add_spdx_headers.py: idempotent header injector
(shebang- and PEP 263-aware)
Touches 1565 source files (.py, .ts, .tsx, .js, .jsx, .css, .sh).
No behavior change; comments only.
Four synthetic operator-behaviour scenarios at the merger level
(aggregate_observations) that pin v0's calibration:
* Stable HUMAN over 7 sessions -> all primitives stable
* HUMAN switches to LLM mid-week -> primitives flip stable -> drifting
* Two operators alternating -> primitives flag multi_actor
(per-primitive; the cross-
primitive multi_actor_suspected
correlator is exercised by Phase 5)
* Single short session -> all primitives unknown
Plus a threshold-lockdown test that asserts every named constant in
_thresholds.py against its v0 ship value. Anyone adjusting a
threshold without updating the scenarios fails this file.
This closes DEBT-051 at v0 — the attribution engine has a calibrated,
test-locked answer to "is this attacker stable / drifting / showing
multiple operators?" without crossing the persona-attribution bright
line. v1 (cross-attacker clustering, KD simhash linkage signal) is
gated on this v0 surface being stable in production for >= 1 month.
Add tick_multi_actor() — periodic walk of attribution_state firing
attribution.profile.multi_actor_suspected when an identity carries
>= MULTI_ACTOR_MIN_PRIMITIVES rows in multi_actor state.
* Repo's list_multi_actor_identities() already filters to >= 2
primitives; the correlator just dispatches.
* In-memory dedup keyed on identity_uuid -> frozenset(primitives):
same set as last fire -> no re-emit. Set grows -> re-emit.
Set shrinks below threshold -> evict so a future re-flap re-fires.
Restart-resets are honest because attribution_state persists; a
v1 multi_actor_suspect_log table can replace this if needed.
* run_attribution_loop() now supervises three concurrent tasks:
observation handler, multi_actor tick loop, health/control. Tick
interval comes from _thresholds.MULTI_ACTOR_TICK_SECS (60s) with
test override.
Tests: 6 scenarios — single-primitive doesn't fire, two-primitive
co-flag fires, dedup blocks unchanged set, set growth re-fires,
threshold drop re-arms, multiple identities fire independently.
attribution_worker.handle_observation_event now executes the full
end-to-end path:
* ensure stub identity (Phase 1)
* observations_for_identity_primitive() — new repo helper joining
observations through attackers.identity_id, so v1's clusterer
gets cross-attacker rollup for free
* aggregate_observations() with ValueKind dispatched off the BEHAVE
PRIMITIVE_REGISTRY; unknown primitives default to categorical
* upsert_attribution_state() — last_change_ts locked when state is
unchanged so the dashboard can render "stable since X"
* publish attribution.profile.state_changed only on transition;
idempotent re-runs over the same observation set fire nothing
(loop-prevention invariant matching ttp.tagged)
Tests:
* 5 end-to-end attribution scenarios over in-memory SQLite + FakeBus.
* test_base_repo's DummyRepo + coverage body now stub every abstract
surface BaseRepository declares — the 6 added by this branch plus
the 12 left un-stubbed by earlier work (BEHAVE Phase 1, TTP
rollups, iter helpers). The coverage test could not previously
even instantiate.
* test_aggregate_categorical's dispatcher rejection updated for the
Phase 3 + 4 contract — ValueError on unknown kinds, not
NotImplementedError.
aggregate_numeric(): EWMA + dispersion (CV) over numeric primitive
values. Stable when CV < 20% AND mean shift < 30%; drifting on >= 30%
mean shift; conflicted on CV > 100%. Confidence is 1 - min(CV, 1).
multi_actor is intentionally NOT a numeric state — bimodal
distributions belong to the categorical detector once the value space
is bucketed.
aggregate_hash(): counts distinct hash values within
HASH_DRIFT_WINDOW_SECS of the most recent observation. 0 rotations =
stable, 1..HASH_DRIFT_MAX = drifting, > HASH_DRIFT_MAX = conflicted.
Reads rotation events; never recomputes hashes (DEBT-032 already
produces them via decnet.correlation.fingerprint_rotation).
aggregate_observations() dispatcher now routes "categorical" |
"numeric" | "hash" | None and rejects unknown kinds with ValueError
(louder than NotImplementedError now that all three v0 mergers
exist). 17 synthetic-input tests cover both new mergers and the
dispatcher.
aggregate_categorical(): pure function over a per-(identity, primitive)
observation list. Five-state vocabulary, last-N=5 window comparison
with one-outlier-tolerant majority threshold:
* unknown — < 3 observations
* stable — recent 5 agree (≥ 4 of 5 share top value), older 5 same
* drifting — recent 5 stable but disagrees with older 5, or older
was conflicted and recent stabilised
* conflicted — recent 5 split, no two-value alternation pattern
* multi_actor — recent 5 split + alternation between exactly two
values (operator A↔B handoff). Confidence capped at 0.6 per
_thresholds.MULTI_ACTOR_MAX_CONFIDENCE; flapping primitives on
flaky networks would otherwise look like two operators.
aggregate_observations() dispatcher honours value_kind="categorical"
(or None) and raises NotImplementedError for "numeric" / "hash" so
Phase 3 lands cleanly. 14 synthetic-input tests cover every state
+ boundary condition.
v0 Phase 1 of ATTRIBUTION-ENGINE.md:
* AttributionStateRow SQLModel keyed on (identity_uuid, primitive)
per ANTI direction — re-keying state rows when the v1 clusterer
merges attackers is the migration debt v0 should not bake in.
ATTRIBUTION-ENGINE.md updated with the deviation note.
* AttributionMixin: ensure_stub_identity_for_attacker, idempotent
upsert_attribution_state, get_attribution_state[_for_identity],
list_multi_actor_identities (the Phase 5 correlator's read).
* attribution.profile.{state_changed,multi_actor_suspected} bus
topics + builder; wiki Service-Bus.md updated separately.
* attribution_worker.py: subscribes to attacker.observation.>,
ensures stub identity per event, logs and continues. No merger,
no state writes, no derived events — Phase 4 wires those.
* attribution/{aggregate.py,_thresholds.py} skeletons: Phase 2
fills _aggregate_categorical, Phase 3 adds numeric+hash+dispatcher.
When the prober observes a NEW hash for an
(attacker_uuid, port, probe_type) triple it has seen before — VPS
rotation, SSH server rebuild, TLS cert swap — emit a derived
attacker.fingerprint_rotated event carrying both old and new hash.
Detection is a small library (decnet.correlation.fingerprint_rotation)
called inline from the prober at each of the three emit sites
(JARM/HASSH/TCPFP). No new daemon. New AttackerFingerprintState table
holds per-triple last-hash state; Attacker.rotation_count and
Attacker.last_rotation_at are stamped on every diff. Library is sync,
fully unit-tested via injected publish_fn / syslog_fn callbacks.
Three producer-side regression guards. Each drives the worker's run
loop with a fake bus + stubbed repo and asserts the documented topic
fires when the producer has data:
- reuse correlator → credential.reuse.detected (one finding row)
- clusterer → identity.formed + identity.merged (one ClusterResult)
- intel worker → attacker.intel.enriched (one unenriched attacker
+ a fake provider returning a "malicious" verdict)
These complement commit 1's attacker.session.ended producer test —
together the four cover every TTP-relevant publisher in the tree
(modulo email.received, which has no producer yet; tracked in
DEBT.md).
Honeypot SSH containers run `PROMPT_COMMAND` that calls
`logger --rfc5424 --msgid command -t bash "CMD …"`. The Docker-stdout
reader prepends an outer RFC5424 envelope (HOSTNAME=<decky>,
APP-NAME=1, MSGID=NIL) around that inner syslog line. Both the
collector parser (`parse_rfc5424`) and the correlation parser
(`parse_line`) saw the outer NIL MSGID and emitted `event_type="-"`
for every shell command — which:
- kept `Attacker.commands` rows missing `command_text`
- left R0001–R0030 (the pattern rule pack that matches shell
commands) with no haystack
- made `decnet.collector.log` show `event written … type=-`
for the very lines that should be `type=command`
Both parsers now detect the inner-RFC5424 shape (`<TS> <HOST> <APP>
<PROCID> <MSGID> <rest>`) when the outer MSGID is NIL and the SD-arm
is also NIL, and re-extract HOSTNAME / APP-NAME / MSGID / remainder
from the body. The collector parser also recovers the post-SD msg
tail when the SD block isn't `relay@55555` (the bash CMD line carries
a `[timeQuality …]` block) so the kv-fallback can find `src_ip`.
Mirroring tests in tests/collector and tests/correlation pin both
the unwrap and the regression guard for non-double-wrapped lines.
SSH/telnet decky containers emit shell commands via `logger -t bash "CMD …"`
which produces RFC 5424 lines with MSGID=NIL. Both parsers were leaving
event_type="-", so the behavioral profiler's `_COMMAND_EVENT_TYPES` filter
silently dropped them — the IP profile existed but no command transcripts
or artifacts. Confirmed in the wild: 44/48 events from one attacker were
event_type="-".
Rewrite event_type to "command" in both parsers when MSGID=NIL and the
msg starts with "CMD ". Correlation parser also extracts the cmd= payload
into fields["command"] so the profiler can build the transcript; collector
parser leaves fields={} to avoid duplicate pills in the dashboard.