Replaces LICENSE (GPLv3 -> AGPLv3) and prepends
`SPDX-License-Identifier: AGPL-3.0-or-later` to every source file
across decnet/, decnet_web/, tests/, scripts/, and tools/.
Rationale: closes the GPLv3 ASP loophole so any party operating a
modified DECNET as a network service must offer their modified
source. Personal copyright (Samuel Paschuan) + inbound=outbound
contributions make a future unilateral relicense infeasible.
- LICENSE: full AGPL-3.0 text (gnu.org/licenses/agpl-3.0.txt)
- COPYRIGHT: project copyright notice
- tools/add_spdx_headers.py: idempotent header injector
(shebang- and PEP 263-aware)
Touches 1565 source files (.py, .ts, .tsx, .js, .jsx, .css, .sh).
No behavior change; comments only.
Every compose invocation used -p decnet so fleet + every topology
lived in one docker compose project. --remove-orphans, run during
fleet pre-up cleanup and on every topology teardown / rollback, then
swept every container in the project not listed in the current compose
file — wiping sibling topologies and the flat fleet along with the
intended target.
Parameterize project on _compose / _compose_with_retry / _compose_ps
(default FLEET_COMPOSE_PROJECT="decnet"). Add _topology_compose_project
that returns decnet-topo-<id8>, and pass it through every topology
compose call site (master deploy_topology + rollback + post-deploy ps,
master teardown_topology, agent apply, agent teardown, all four live
service mutations on topology deckies). Fleet calls keep the default
and are unaffected.
Migration: live containers from before this fix remain in the shared
"decnet" project and need a one-time manual cleanup before they're
reachable to the new topology code paths.
The wizard POSTs only the new decky on each submit. The handler used to
treat every INI as the complete desired fleet (config.deckies = INI) so
the reconciler tore down prior deckies as orphans — deploying a second
Windows workstation silently wiped the first.
Add replace_fleet to DeployIniRequest (default false). Default path
merges new deckies into existing config and rejects name/IP collisions
with 409. replace_fleet=true preserves set-desired-state semantics for
CLI / declarative callers. Lifecycle rows are created only for the
deckies submitted in the current call, so /deckies/lifecycle?ids=...
reflects exactly what this submit deployed.
build_deckies_from_ini gains reserved_ips so additive auto-allocation
skips IPs already held by the existing fleet.
GET /deckies/lifecycle?ids=<uuid>&ids=<uuid> returns the matching
DeckyLifecycle rows so the wizard can poll instead of holding an HTTP
request open across compose work. require_viewer gating -- read-only.
Startup sweep: on master boot, any pending/running row with
started_at older than 1h flips to failed with
error='master restarted during operation'. Pre-v1 substitute for a
durable task queue: if the master crashes mid-deploy, the wizard sees
FAILED on refresh and the operator retries. Idempotent + cheap; runs
unconditionally including in contract-test mode.
This is the unblock for the wizard hang. Both endpoints used to run
docker compose synchronously inside the HTTP handler -- on master
(unihost) or via asyncio.gather of worker /deploy POSTs at 600s
timeout each (swarm) -- blocking every other API request.
New flow:
1. Commit the new config shape to repo state (fast).
2. Create one DeckyLifecycle row per decky (status=pending).
3. Spawn asyncio.create_task(run_deploy / run_mutate) -- the
lifecycle runner drives rows through running -> succeeded|failed
and emits decky.<name>.lifecycle on the bus.
4. Return 202 with {lifecycle_ids: [...]}. Wizard polls
GET /deckies/lifecycle?ids=... (next commit).
mutator/engine.py gains pick_new_services() -- shared between the
async API path and the watch-loop's synchronous mutate_decky().
DeployResponse grows lifecycle_ids[]. The old dispatch_decnet_config
helper still exists for the CLI swarm-deploy command path; it just
isn't called from the API handler anymore.
Test changes: 200 -> 202, drop dispatch_decnet_config mocks (handler
no longer calls it), assert lifecycle_ids in response + committed
state matches expectations.
HeartbeatRequest grows an optional lifecycle field carrying per-decky
completion records from the worker:
[{decky_name, operation, status, error?, completed_at?}]
For each delta, the master finds the most-recently-started open
DeckyLifecycle row for (decky_name, operation, host_uuid) and flips
it to terminal with the worker's error text + timestamp. Stale
duplicates (row already sealed or never existed) are logged and
dropped -- not errors.
Each successful pivot also emits decky.<name>.lifecycle on the bus
so the dashboard sees the transition without waiting for its next
poll tick.
This is the master-side completion channel for the worker's 202
fire-and-forget /deploy and /mutate.
The wizard API used to hang because /deckies/deploy ran docker compose
build && up -d synchronously, holding the request thread for minutes.
The worker side of that pipeline now returns 202 Accepted immediately
and runs the deploy in an asyncio.create_task.
On task completion (success or failure) the worker pushes a one-off
heartbeat carrying a lifecycle delta per decky:
{decky_name, operation, status: succeeded|failed, error?, completed_at}
Master pivots these onto open DeckyLifecycle rows in the heartbeat
handler (next commit). The scheduled 30s heartbeat tick is the
fallback if the immediate push drops.
- decnet/agent/app.py: /deploy and /mutate return 202; dry_run mutate
still validates synchronously and returns 200.
- decnet/agent/executor.py: deploy_async + mutate_async wrap the work
and push the completion delta.
- decnet/agent/heartbeat.py: push_lifecycle_delta() helper builds a
one-off body and POSTs with the same mTLS context as the loop.
- decnet/swarm/client.py: revert deploy/mutate to control timeout
(master no longer holds the HTTP request open for compose work).
Worker state.json gains no lifecycle field -- master DeckyLifecycle is
the source of truth; the master sweep handles crashed-mid-deploy
recovery.
Add decnet.lifecycle package: pure orchestration layer that the
master API will invoke via asyncio.create_task to drive DeckyLifecycle
rows through pending -> running -> succeeded | failed without
holding an HTTP request open.
Strategy classes per (operation, transport):
- LocalDeployStrategy: master-resident, runs engine.deployer.deploy
in a thread.
- SwarmDeployStrategy: shards by host_uuid, dispatches via
AgentClient.deploy; worker drives terminal via heartbeat.
- LocalMutateStrategy: write_compose + compose up.
- SwarmMutateStrategy: AgentClient.mutate; worker drives terminal.
decnet.bus.topics gains decky_lifecycle(name) -> decky.<name>.lifecycle
plus DECKY_LIFECYCLE constant. Payload documented in the wiki
(separate commit). publish_safely keeps bus best-effort.
Nothing is wired to call this yet -- next commits convert worker
/deploy /mutate to 202, then heartbeat delta wiring, then master API.
One row per (decky, operation) attempt. State machine:
pending -> running -> succeeded | failed (+ error text). Rows are
append-only after terminal; retries write a new row.
Sibling of DeckyShard rather than a rework -- DeckyShard tracks
runtime container state observed via heartbeat, this tracks
operation lifecycle. New table, UUID PK.
Adds BaseRepository abstract methods (create_lifecycle,
update_lifecycle, get_lifecycle_by_ids, find_open_lifecycle,
sweep_stale_lifecycle) with SQLModelRepository mixin impl.
Backbone for the upcoming 202-Accepted async API.
- Implement /mutate handler: load_state, update services + last_mutated,
save_state, write_compose, compose up -d via asyncio.to_thread. 404
for missing state / unknown decky_id. dry_run short-circuits before
any side effect.
- Add AgentClient.mutate(decky_id, services, *, dry_run=False) using
_TIMEOUT_DEPLOY (compose up can pull/build, exceeds control timeout).
- mutator/engine.py: in swarm mode with decky.host_uuid set, resolve
worker via _resolve_swarm_host and dispatch through AgentClient.mutate
instead of writing a compose file on master. Master-resident deckies
(unihost mode, or swarm with host_uuid=None) keep the local path.
Adds state_path ServiceConfigField and passes DNS_STATE_PATH into the
container environment. Operator must mount the parent directory on a
volume for persistence to survive container recreation.
Switch burst deque from monotonic() to time.time() (wall-clock, serializable).
Add DNS_STATE_PATH env var: on startup _load_state() reads {src:[ts,...]} JSON
and prunes entries older than the burst window. _flush_state() write-then-renames
atomically; _state_flusher() coroutine flushes every 5s when dirty. Detection of
the 5th event also triggers an immediate flush. No-op when DNS_STATE_PATH is
unset, so the default deployment is unchanged.
Rename _txt_times -> _tunnel_times. Add TYPE_CNAME=5, TYPE_NULL=10,
TYPE_PRIVATE=65399 constants. Guard burst counter with _TUNNEL_QTYPES
frozenset instead of TYPE_TXT only. Mixed-type queries from one source
now share a single burst window, closing iodine NULL/CNAME downlink
and AAAA-encoded uplink evasion gaps.
_is_tunneling now returns str|None (the detection method) instead of bool.
Two new tunables _QNAME_TOTAL_LEN_THRESHOLD=50 and _QNAME_ENTROPY_THRESHOLD=3.5
catch attackers who split a high-entropy payload across multiple short labels.
tunnel_method field added to tunneling_suspect events for downstream correlation.
_parse_edns_size only extracted the requestor UDP size; every other field in
the OPT record (DO bit, EDNS version, extended RCODE, all sub-options) was
invisible. Replaced with _parse_opt_record returning a full dict:
udp_size, ext_rcode, version, do_bit, z, options[(code, len, data)]
NSID request (option code 3) is now detected as fingerprint_probe with
probe=edns_nsid and contributes to recon_burst. DO bit, COOKIE (10), and
other options are not escalated; udp_size continues to drive amp_probe.
Tools like fpdns send OPCODE=IQUERY/STATUS/NOTIFY/UPDATE or set the reserved
Z bit to fingerprint resolver behaviour. Previously all these were parsed as
standard queries with no signal.
- opcode!=0 → fingerprint_probe probe=opcode_<name>, NOTIMP response;
fired before qdcount check so qdcount=0 UPDATE packets are still caught.
- Z bit set OR (AD+CD without RD) → fingerprint_probe probe=header_flags;
AD alone with RD is ignored to avoid tagging DNSSEC-aware stubs.
- Both variants contribute to recon_burst.
qclass=255 in a standard query is unusual enough to be a fingerprinting probe
(fpdns, various scanner scripts). Previously it was logged as a plain query
with qclass=ANY in the event field; now it emits fingerprint_probe with
probe=qclass_any and returns REFUSED — consistent with how we treat other
probe types. Contributes to recon_burst.
The inline probe_map dict inside _handle made tests blind to the probe
catalogue and couldn't be extended without touching the hot path. It is now
module-level _CHAOS_PROBE_MAP. authors.bind. joins the three existing entries
so it gets named correctly instead of carrying the raw qname.
Packets with multiple questions were silently parsed at q0 only; the extra
questions were invisible. Now emits multi_question at severity=5 with the
qdcount and q0 qname, then falls through and answers q0 normally.
Silent drops on <12B packets, qdcount=0, and question-section ValueError gave
fuzzers and scanners a completely dark target. New events malformed_packet,
empty_question_section, and question_parse_error fire at severity=5 so these
probes are visible without counting toward recon_burst.
Adds DNS_FORWARD_BUDGET (default 50) and DNS_FORWARD_WINDOW (default 1.0s)
env vars. _can_forward() maintains a rolling deque of upstream call
timestamps; queries that exceed the budget within the window are answered
with the sinkhole (127.x) instead of being forwarded, making the honeypot
ineligible as a sustained amp vector even when real_recursive is enabled.
Rate limit is global (not per-source) so IP-spoofed amplification floods
hit the ceiling regardless of how many source addresses are rotated.
When DNS_REAL_RECURSIVE=true and DNS_ZONE_MODE=recursive, out-of-zone
queries are forwarded to DNS_UPSTREAM (default 8.8.8.8:53) via async
UDP. Upstream response is relayed as-is; on timeout or error the
already-computed sinkhole (127.x) is returned instead.
_handle() always runs first so logging, tunneling detection, flood
tracking, and recon-burst aggregation fire on every query regardless
of whether the response ultimately comes from upstream. _dispatch()
overlays forwarding on top of the sync handler.
Protocol handlers (UDP datagram_received, TCP session) are now async
via asyncio.ensure_future / await _dispatch(). Service class exposes
real_recursive (bool) and upstream (string) config fields.
RA=1 + empty answer section is immediately detectable as fake by any
open-resolver scanner. Recursive mode now behaves like open mode
(127.0.0.x sinkhole, deterministic on qname) with RA=1 and AA=0,
matching what a real recursive resolver returns.
- Add per-src QPS counter (_qps_window) with flood_suspect event at ≥50 qps/10s;
one event per src per 30s cooldown, does not suppress baseline query events.
- Add tracking_evicted telemetry every 100 LRU evictions so IP-rotation evasion
of _txt_times/_qps_window/_recon_window is observable, not silent.
- Shared _track_lru helper consolidates LRU touch + eviction signalling across
all three bounded OrderedDicts.
- Add TYPE_AAAA=28 support: _fake_ipv6() returns deterministic ULA (fd::/8)
addresses for in-zone names; extra_records parser now accepts and validates
AAAA entries via socket.inet_pton.
- Add per-src recon-burst aggregation (_recon_window): fingerprint_probe +
zone_transfer + amp_probe are tracked per source in a 60s window; recon_burst
fires when ≥2 distinct signal types seen, once per src per 120s cooldown.
- 47 tests passing (19 new across TestAAAARecords, TestFloodDetection, TestReconBurst).
Python asyncio DNS server on UDP+TCP/53 masquerading as BIND 9.x.
Emits four event_type values: query, fingerprint_probe (version.bind /
hostname.bind / id.server CHAOS), zone_transfer (AXFR/IXFR, always
REFUSED), amp_probe (qtype=ANY or EDNS udp_size>1232), and
tunneling_suspect (long high-entropy labels or rapid TXT burst).
Zone persona is generated per-decky from instance_seed (domain name,
SOA serial, NS, A, MX, TXT SPF); overridable via config_schema.
Three zone modes: auth (default), recursive, open (sinkhole).
AttackerData type gets bgp_prefix / rpki_status / rpki_source.
TimelineSection renders prefix inline next to AS number; RPKI status
shows as a green RPKI VALID / red RPKI INVALID badge, or dim
NO ROA for not-found. rpki-status-badge CSS added to Dashboard.css.
Export network block extended with the three new fields.
Import enrich_rpki from decnet.rpki and call it inline after the
ASN lookup. bgp_prefix, rpki_status, rpki_source added to the
record dict that feeds the Attacker upsert. enrich_rpki short-circuits
to (None, None) when asn is None, so private / unannounced IPs
never hit RIPE STAT.
bgp_prefix (max 43 chars, indexed) holds the covering CIDR from
the ASN lookup. rpki_status / rpki_source hold RIPE STAT validation
outcome. All nullable — null means enrichment was skipped or ASN
did not resolve.
RipeStatValidator makes two RIPE STAT calls per uncached IP:
network-info -> announced prefix, rpki-validation -> ROA state.
2-second timeout; any network failure returns status='unknown'.
SQLite cache keyed by IP, 12-hour TTL, pruned on validator init.
Cache avoids per-event HTTP for the high-churn attacker pool —
steady-state cost approaches zero for repeat offenders.
Synthesize the covering CIDR at lookup time from the matched iptoasn
range using ipaddress.summarize_address_range. AsnInfo.prefix is
populated per-query; not persisted in the pickle cache.
enrich_ip now returns (asn, as_name, bgp_prefix, provider_name).
Profiler worker updated to unpack the 4-tuple and write bgp_prefix
into the attacker record dict.
Four RFC 4443 stimuli (port-unreach, hop-limit-exceeded, unknown-NH,
bad-dest-option) produce a 4-char matrix + sha256 fingerprint for IPv6
attackers. Auto-registers via ActiveProbeMeta at priority=860 (after v4
icmp_error=850, before ipv6_leak=999). IPv4 targets fast-return None.
Sends four crafted stimuli (UDP/closed-port, TTL=1, DF+oversized,
bad IP option) and records which ICMP error classes come back, the
per-error RTT, and the bytes echoed in each ICMP body. Absence is
as informative as a reply — Linux rate-limiting is a fingerprint signal.
Returns None when no packets could be sent (no CAP_NET_RAW), so the
probe is a no-op in non-root test environments. Port-free ActiveProbe
subclass (priority=850), metaclass auto-registered in the registry.
Also fixes three sets of stale tests left over from the TlsCertProbe
migration (4b2759e0):
- test_active_probe_registry: closed name/order sets updated for
tls_certificate and icmp_error
- test_prober_rotation: dead patches on worker.fetch_leaf_cert removed
- test_prober_worker (TestProbeCycleTLSCert): rewritten to test
TlsCertProbe as an independent registry probe, patch target updated
from worker.fetch_leaf_cert to probes.tlscert_probe.fetch_leaf_cert
TLS cert capture was the last prober special-case that bypassed
ActiveProbeMeta. Moves logic into TlsCertProbe (priority=200, runs
after JARM) in probes/tlscert_probe.py; drops _capture_tls_cert,
the probe.probe_name=="jarm" name-check, and the direct
fetch_leaf_cert import from worker.py.
ActiveProbe.run/syslog_fields/publish_payload now accept port=None so
non-port-iterating probes can live in the registry. Ipv6LeakProbe replaces
the hand-rolled _ipv6_leak_phase special case in worker.py; it runs last
via priority=999. _probe_cycle no longer has an ad-hoc phase call.
Fixes three stale test files (test_prober_bus, test_prober_rotation,
test_prober_worker) that were broken since the 916b21b6 registry refactor.
_route_info() calls _ip_route_get once and returns (on_link, iface);
worker._ipv6_leak_phase now calls it instead of the two separate helpers.
Bare except clauses at _ip_route_get and response parse now log at debug.
ingester: wrap bootstrap get_state() in forever-retry loop — MySQL coming
up after the API process killed the ingestion task permanently before it
ever entered _run_loop. Regression test added.
deps: idna 3.13→3.15 (CVE-2026-45409), twisted 26.4.0rc2→26.4.0
(PYSEC-2026-160), pip 26.1→26.1.1 (CVE-2026-3219 resolved upstream),
behave-core/behave-shell renamed from decnet-behave-* and bumped to 0.1.1.
pre-commit hook updated to reflect current ignore list.
Replace _jarm_phase / _hassh_phase / _tcpfp_phase boilerplate (3×~50
lines of identical port-iteration logic) with a metaclass-registered ABC.
Adding a new port-iterating active probe is now one class + three methods.
- decnet/prober/base.py: ActiveProbeMeta auto-registers subclasses by
probe_name; ActiveProbe ABC enforces run/syslog_fields/publish_payload
with env-driven DECNET_PROBE_PORTS_<NAME> port override.
- decnet/prober/probes/{jarm,hassh,tcpfp}.py: concrete probe classes.
- decnet/prober/worker.py: single _run_probe driver replaces the three
phase functions; _probe_cycle iterates ActiveProbeMeta.all(); drops
the ports=/ssh_ports=/tcpfp_ports= kwargs from prober_worker.
- IPv6 leak and TLS cert capture stay as special cases (different call
shapes; intentionally outside the registry).
- tests/prober/test_active_probe_registry.py: registry contents, sort
order, priority-10 override, ABC contract per probe class.
- tests/prober/test_run_probe_driver.py: dedup, success, None-skip,
exception, rotation, publish paths for _run_probe.
- tests/prober/test_prober_worker.py: updated patch targets and
_probe_cycle call sites; port control via monkeypatch.setattr.
- Add "ipv6_leak" to KNOWN_SOURCE_KINDS in ttp/base.py
- Register Ipv6LeakLifter(store) in factory.py get_tagger()
- Subscribe worker to attacker.fingerprinted; route by Event.type
so JARM/HASSH/ipv6_leak share the topic without source_kind collision
- Add bump_attacker_ipv6_leak() to BaseRepository (abstract) +
TTPMixin (implementation): increments ipv6_leak_count, sets last_ipv6_*
denorm fields, appends-with-dedup to AttackerIdentity.ipv6_link_local_iids
- Call bump_attacker_ipv6_leak from _process_event after insert_tags
- Add DummyRepo stub + coverage call in tests/db/test_base_repo.py
Add inline documentation for all known kind= discriminators on the
fingerprinted topic including the new ipv6_leak variant so future
consumers know what fields to expect without reading the prober source.
Ipv6LeakLifter subscribes to source_kind="ipv6_leak" events from both
the passive sniffer and active prober. Emits T1090 (Proxy) under TA0011
(C2) when fe80:: source address is observed — the attacker's VPN only
tunnels IPv4 so their link-local IID leaks their NIC identity.
Rule R0059 sets base confidence 0.85; iid_kind in the evidence carries
the per-observation strength (eui64 = MAC-derived, deterministic;
stable_privacy = RFC 7217; temporary = RFC 4941).
Add ipv6_leak.py with solicit_ipv6_leak() — sends ICMPv6 Echo to
ff02::1 on the attacker's iface and returns fe80:: evidence when a
link-local response arrives. Gated on _is_on_link(): skips when
attacker is behind a router (no L2 adjacency).
Add _ipv6_leak_phase() to worker.py (Phase 4 in _probe_cycle).
Phase runs once per attacker IP per cycle (sentinel at port 0 in
ip_probed["ipv6_leak"]) and publishes kind="ipv6_leak" via publish_fn.
Add list_v6_addrs(iface) to network.py: returns [(addr, scope)] for
all IPv6 addresses on an interface, required for source-routing ICMPv6
from the correct link-local address.
Add _ipv6_iid_classify() to fingerprint EUI-64 vs stable-privacy IIDs
and derive the MAC OUI from EUI-64-encoded link-local addresses.
SnifferEngine._on_ipv6_packet() observes fe80::/10 sources destined for
known deckies and emits ipv6_link_local_leak syslog + bus events.
on_packet() now dispatches the IPv6 branch before the v4 TCP path.
BPF default widened from "tcp" to "tcp or ip6" so the sniff loop
captures IPv6 frames without config change.
Attacker gains five denormalized cache fields (ipv6_leak_count,
last_ipv6_leak_at, last_ipv6_link_local, last_ipv6_iid_kind,
last_ipv6_mac_oui) mirroring the rotation_count/last_rotation_at pattern.
AttackerIdentity gains ipv6_link_local_iids (JSON list[dict]) for
EUI-64-derived MAC cluster signals that survive VPN/IP rotation.
No ALTER TABLE helpers — direct SQLModel column additions per pre-v1 policy.
Pins the evidence shape for IPv6 link-local leakage findings. All fields
optional (total=False) so partial observation (passive sniffer vs active
solicitation) fills whatever the vector provides. Lifter lands in a
subsequent commit.