Two sub-routers under /api/v1/canary:
blobs (operator-uploaded artifacts, deduped by sha256):
- POST /blobs (multipart upload; admin)
- GET /blobs (list with token_count; admin)
- DELETE /blobs/{uuid} (refcount-aware; 409 when referenced; admin)
tokens (per-decky planted artifacts):
- POST /tokens (generate or instrument + plant; admin)
- GET /tokens?decky_name=&kind=&state= (filter; viewer)
- GET /tokens/{uuid} (detail; viewer)
- GET /tokens/{uuid}/preview (instrumented bytes; admin)
- GET /tokens/{uuid}/triggers (paged callback log; viewer)
- DELETE /tokens/{uuid} (revoke + bus event; admin)
XOR validation: exactly one of blob_uuid / generator must be set.
Path validation rejects relative/NUL/newlines/.. segments. Every
body-bearing route documents 400 plus 401/403/404 as applicable.
Stdlib MIME sniffer (no python-magic dep) covers PNG/JPEG/GIF/PDF/
HTML/XML/DOCX/XLSX/JSON/YAML/TOML/text/plain; everything else falls
through to passthrough.
Tests run end-to-end through the live FastAPI app (planter docker
exec is patched); 17 cases covering dedup, refcount, lifecycle,
XOR validation, path validation, and 404 paths.
decnet canary launches the HTTP + DNS callback receiver via
decnet.canary.worker.run. Mirrors the shape of decnet webhook
(typer command with --daemon flag, asyncio.run in the foreground).
Deliberately NOT added to MASTER_ONLY_COMMANDS — every host that
hosts deckies runs its own canary worker, and the bus events stay
local to that host (per-host webhook fanout handles SIEM egress).
decnet canary worker hosts both callback surfaces in one process:
- HTTP: a tiny FastAPI app on its own port (default 8088). The only
meaningful route is GET /c/{slug} which looks up the slug, persists
a CanaryTrigger, publishes canary.<id>.triggered, and returns a 1x1
transparent GIF. Unknown slugs return the same response (stealth);
no decnet strings leak in headers/banners; docs/openapi/redoc are
disabled. X-Forwarded-For is honored.
- DNS: an authoritative UDP server for *.<canary_zone> using
asyncio.DatagramProtocol with stdlib-only DNS wire-format parsing
(no dnslib dep). Same lookup -> persist -> publish flow, plus a
sinkhole A record (192.0.2.1) so the attacker's resolver doesn't
loop on NXDOMAIN. Single-label slugs only; multi-label probes
return NXDOMAIN. Pointer loops in malformed queries are caught
(10-hop cap) so an adversarial packet can't wedge the parser.
Tests cover both surfaces without privileged sockets:
- HTTP via Starlette TestClient: known/unknown slug, headers, XFF,
stealth-string assertions.
- DNS via direct DatagramProtocol drive: known slug -> ANSWER,
unknown -> NXDOMAIN, pointer-loop -> ValueError, malformed
packet -> silent drop.
Plant / revoke / seed_baseline using the same docker-exec-with-sh-c
pattern proven by decnet/orchestrator/drivers/ssh.py:_run_file.
Each plant call composes a single sh script:
mkdir -p <dirname> && printf %s <base64> | base64 -d > <path> &&
chmod <mode> <path> && touch -d @<mtime> <path>
Base64-on-the-host / decode-in-the-container keeps binary artifacts
(DOCX/PDF/PNG) safe across the argv boundary; the placement_path,
mode, and mtime are shlex-quoted.
State transitions hit the repo: planted -> failed on docker error
with stderr captured into last_error. Bus events fire on success
(canary.<id>.placed) and on revoke (canary.<id>.revoked) — wrapped
in try/except so a downed bus never blocks a placement.
seed_baseline(decky_name, repo) is the deploy-hook entry point —
reads DECNET_CANARY_BASELINE (default git_config,env_file,honeydoc,
aws_creds), persists one row per generator, plants each. Failed
placements are logged but do NOT abort; the deployer hook treats
the return list as informational.
Seven instrumenters that mutate operator-supplied artifacts to
embed the callback URL:
- passthrough — bytes unchanged; only DNS-callback tokens trip
detection, with the slug embedded in the placement path
- plain — substitutes {{CANARY_URL}}/{{CANARY_HOST}} placeholders;
falls back to appending a comment line whose prefix adapts to the
apparent file syntax (#, //, ;)
- html — injects a 1x1 tracking pixel before </body>, appends
if the close tag is missing
- docx — direct zipfile manipulation (no python-docx dep):
inserts an external-image Relationship into word/_rels/document.xml.rels
and a matching <w:drawing> element before </w:body>
- xlsx — sibling of docx; injects an external-image relationship
into xl/_rels/workbook.xml.rels (orphan rels are still fetched on
open by most viewers)
- pdf — uses pikepdf to install /OpenAction /URI on the catalog;
rejects with a clear message when pikepdf isn't installed
- image — uses Pillow to embed slug + URL in PNG tEXt / JPEG
comment; rejects with a clear message when Pillow isn't installed
DOCX and XLSX share the rId allocator + relationship injector via
the docx module; both work on stdlib zipfile only.
Tests synthesise minimal real DOCX/XLSX fixtures inline, round-trip
each instrumenter, and assert the callback URL ends up in the
mutated bytes while the file still parses.
Five built-in generators that produce deterministic fake artifacts
keyed by the token slug:
- aws_creds — passive [default]/[prod] credentials block, no
callback wiring (AWS-key tokens require an external
trap, which is post-v1)
- git_config — .git/config with origin url = http_base/c/<slug>/repo.git
- env_file — .env with API_BASE_URL + WEBHOOK_NOTIFY_URL embedding
the callback URL plus inert realism filler
- ssh_key — PEM-shaped fake private key whose host comment carries
<slug>.<dns_zone> when DNS is deployed, else the
http_base host
- honeydoc — minimal HTML report with a 1x1 tracking-pixel <img>
whose src is the callback URL; fallback for the
deploy-time baseline before the operator uploads a
real DOCX/PDF
Tests assert byte-stability (same ctx -> same bytes), slug presence
in the embedded fields, that aws_creds is intentionally URL-free,
and that every artifact carries operator-facing notes for the
preview endpoint.
Mirrors the decnet.intel layout (base + factory + lazy concrete
imports). Defines:
- CanaryArtifact / CanaryContext dataclasses + the generator and
instrumenter ABCs they share
- factory dispatch for generators (git_config/env_file/ssh_key/
aws_creds/honeydoc) and instrumenters (docx/xlsx/pdf/html/image/
plain/passthrough), plus pick_instrumenter_for_mime() for MIME-driven
dispatch on operator uploads
- persona-aware default placement paths (Linux vs. Windows-shaped)
and absolute-path validation that the API will use to validate
operator-supplied placement_path values
- on-disk blob store: sha256-keyed two-level fan-out, idempotent
writes, refcount-aware unlink (the DB row is the source of truth)
Also covers prior commits' tests (bus topics, models, repo CRUD)
under tests/canary/. 79 tests, all pass.
Adds the abstract surface on BaseRepository and the SQLModel-backed
implementation (shared by SQLite and MySQL) for:
- canary blobs (upsert-by-sha256, list-with-refcount, refcount-aware delete)
- canary tokens (create, slug lookup, list with filters, state update)
- canary triggers (record+bump-counters atomically, list, attribute)
The triggers path is a single session that inserts the row and bumps the
parent token's counters together, so a subscriber that reads the token
right after the bus event sees the updated count. Blob delete refuses
while any token (including revoked) still references the blob; pre-v1
revoked tokens stick around for forensic value.
Three new tables for the canary tokens feature:
- canary_blobs — operator-uploaded source artifacts, deduped by sha256
- canary_tokens — one planted artifact in one decky; carries the
callback slug, generator/instrumenter, and lifecycle
- canary_triggers — append-only log of every callback hit; attacker_id
back-filled by the correlator
Pydantic request/response shapes live in the same file per the
single-source-of-truth convention. No migrations file — pre-v1
SQLModel.metadata.create_all() covers it.
Reserved topic family for the upcoming canary-tokens feature so the
correlator and webhook fanout can subscribe to canary.> from day one.
No producers yet; planter, decnet canary worker, and API will publish
in subsequent commits.
* decnet/templates/{rdp,smb}/ntlmssp.py — minimal Type 3 (Authenticate)
parser shared between the SMB and RDP-NLA templates. Lands NTLM
creds in the universal Credential table with secret_kind=ntlmssp_v1
/ ntlmssp_v2 and secret_b64 = base64 of the NtChallengeResponse so
the bounty pipeline can feed the right hashcat mode.
* scripts/decnet-init.sh — convenience wrapper around `sudo decnet init
--force` that targets the current working directory; saves operators
retyping the install paths during dev iterations.
New dashboard surface for editing the global emailgen persona pool —
the JSON file fleet (MACVLAN/IPVLAN) and SWARM-shard mail deckies pull
from. MazeNET topology personas are out of scope here; they're
configured per-topology in the topology editor.
Backend:
* GET/PUT /api/v1/emailgen/personas — admin-write, viewer-read. PUT
validates with the same Pydantic schema the worker uses
(parse_personas), drops invalid entries with a warning, returns 400
only when the entire payload fails. Path is operator-discoverable
on every response so a CLI-driven backup workflow stays visible.
Frontend:
* PersonaGeneration.tsx + .css — table + add/edit modal with the full
EmailPersona schema (name, email, role, tone, mannerisms list,
language, signature, active hours, reply latency, uses_llms_heavily).
Local edits are batched; explicit "SAVE CHANGES" writes back, with a
dirty-indicator pill and a "DISCARD" reset. Email uniqueness is
enforced client-side so the scheduler never picks the same persona
as both sender + recipient.
* Sidebar AUTOMATION group gains a "Persona Generation" entry next to
Orchestrator; route registered at /persona-generation.
The worker reads the same on-disk file the API writes — see
decnet.orchestrator.emailgen.global_pool. The API resets the
in-process cache on every read/write so the worker picks up dashboard
edits within its next tick rather than waiting on mtime.
The SSE pipe at /orchestrator/events/stream was already streaming
'orchestrator.email.{decky_uuid}' events (the subscription is for the
'orchestrator.>' wildcard), but the consumer side dropped them on the
floor. Three fixes to close the loop:
* useOrchestratorStream.ts now registers an 'email' SSE listener — the
EventSource silently ignores frames whose event name has no listener,
so missing this entry meant every email frame was dropped before
reaching the page's onEvent handler.
* /api/v1/orchestrator/events accepts kind=email and dispatches to
list_orchestrator_emails, adapting rows to the existing wire shape:
subject -> action, sender_email -> src_decky_uuid, recipient_email
-> dst_decky_uuid, plus email-specific extras (thread_id, language,
mail_decky_uuid, message_id, in_reply_to) ride along as top-level
keys.
* Orchestrator.tsx gains an 'email' tab in the kind filter and a
branch in the row renderer / inspector that:
- shows full sender / recipient (no UUID truncation),
- chips the language code next to the subject,
- relabels ACTION as SUBJECT in the inspector and surfaces
thread / in-reply-to / mail-decky details.
The 'all' tab continues to show traffic+file only (today's behavior);
operators see emails by switching to the email tab. A union view at
the API layer is the obvious follow-up but not necessary for now.
Two-layer gating per CLAUDE.md:
- registration-time: emailgen added to MASTER_ONLY_GROUPS so agents
don't see the sub-app in 'decnet --help' at all.
- body-guard: _require_master_mode('emailgen ...') at the top of every
sub-command body so a direct callable import (third-party tooling)
still bails on agent hosts.
Matches the convention used for 'swarm', 'topology', 'geoip'. SWARM
agents push their generated mail through the master's emailgen worker
(or none at all); cross-agent emailgen federation stays out of scope.
Lift the Ollama subprocess shell-out out of EmailDriver and into a
proper provider subpackage shape:
decnet/orchestrator/emailgen/llm/
base.py — LLMBackend Protocol + LLMResult + LLMTimeout
factory.py — get_llm() reads DECNET_EMAILGEN_LLM
impl/ollama.py — current 'ollama run' subprocess path
impl/fake.py — canned-output backend used by tests
Driver now takes an LLMBackend on construction (or inherits the
factory default). Tests inject FakeBackend instead of monkeypatching
the subprocess layer, which is cleaner and ~10x faster. Swapping
Ollama for the Anthropic API / vLLM / llama.cpp is now a third branch
in factory.py; no driver rewrite needed.
Mirrors the convention used by decnet.web.db.factory + decnet.bus.factory
per the provider-subpackages-from-day-one rule in memory.
Two changes that unwind earlier MazeNET-only assumptions and fix a
realism tell:
1. Persona resolution is now per-decky-source, not topology-only. The
scheduler walks the union view (list_running_deckies, including
fleet MACVLAN/IPVLAN + SWARM shards) and picks the right persona
list for each source:
* topology decky -> Topology.email_personas (per-topology richness
preserved)
* fleet / shard -> a single host-wide pool loaded from disk
(DECNET_EMAILGEN_PERSONAS, /etc/decnet/email_personas.json, or
~/.decnet/email_personas.json)
Operators install the global pool via 'decnet emailgen
import-personas <file>' which validates with the same Pydantic
schema the worker uses.
2. The driver now runs 'touch -d <Date>' inside the docker exec right
after the EML write so file mtime matches the email's RFC 2822
Date: header. Without this an attacker 'ls -lt'ing the spool sees
every email clustered inside the worker's tick window — the
cluster itself was a stylometric tell.
CLI now exposes 'decnet emailgen' as a sub-app with 'run' (default,
backwards-compatible with bare 'decnet emailgen') and 'import-personas'.
list_running_deckies carries topology_id through so consumers can resolve
the parent topology without a second round-trip.
When IMAP_EMAIL_SEED / POP3_EMAIL_SEED points at a directory of .eml
files (the orchestrator emailgen worker's drop path,
/var/spool/decnet-emails/ by convention), the bait mailbox is replaced
with those LLM-generated, persona-driven, threaded messages. Empty /
missing dir keeps the hardcoded fallback so a fresh deployment is never
silent. Cached with mtime invalidation + a short TTL so a hot mailbox
doesn't pay the parse cost on every IMAP/POP3 command.
Replaces the DEBT-026 stub on both templates that named the env var but
never wired it through.
Second orchestrator worker (decnet emailgen) that drips persona-driven,
threaded, multi-language fake emails into running mail deckies. Personas
live on Topology.email_personas; topology-wide language_default falls
through to any persona that doesn't pin its own. Em-dashes are
suppressed at the prompt layer by default and only lifted for personas
explicitly marked uses_llms_heavily — em-dashes are an LLM tell and a
flat corpus of em-dashed mail is a giveaway.
EML delivery writes into /var/spool/decnet-emails/<thread>/<msg>.eml on
the mail decky via docker exec; wiring the IMAP/POP3 templates to read
from that spool (replacing the hardcoded _BAIT_EMAILS) is the next step.
Once the orchestrator started seeing fleet + SWARM shard sources via
list_running_deckies (a844148), every event row landing on a fleet decky
broke the FK to topology_deckies — the column now carries opaque ids
("local:omega-decky" for fleet, "host_uuid:decky_name" for shards) that
will never match topology_deckies.uuid.
Symptom on the operator's mothership:
IntegrityError 1452 — orchestrator_events_ibfk_2 FK violated on every
tick once the reconciler populated fleet_deckies.
Index on dst_decky_uuid is preserved (the dashboard reads
"events for this decky" frequently); only the FK is removed. Keeps
data integrity loose by design — events are append-only history that
should outlive the deckies they reference.
Existing MySQL deployments need the FK dropped manually:
ALTER TABLE orchestrator_events
DROP FOREIGN KEY orchestrator_events_ibfk_2,
DROP FOREIGN KEY orchestrator_events_ibfk_1;
SQLite users are unaffected — SQLite doesn't enforce FKs by default.
The Workers panel (Config → Workers tab) hardcodes its row list in
KNOWN_WORKERS — by design, so a rogue publisher can't inject UI rows.
Three heartbeat-emitting workers were missing:
* clusterer — behavioral clustering (decnet/clustering/)
* campaign-clusterer — campaign assembly (decnet/clustering/campaign/)
* reconciler — host-local fleet convergence (added in 430262e)
Each already publishes on system.<name>.health via run_health_heartbeat,
so they show up live the moment they're added to the registry — no
frontend or subscriber wiring needed (Config.tsx renders whatever
/workers returns).
Also added to _PREFERRED_ORDER in start-all so START ALL WORKERS brings
them up in dependency-friendly order: data-plane → reconciler → intel
→ clustering → output → orchestrator.
Three deployable units (listener, web, swarmctl) intentionally remain
absent from KNOWN_WORKERS — they don't emit heartbeats (CLI / static
server / one-shot tooling), so they'd permanently render as UNKNOWN
and confuse operators. Adding them is a separate decision that needs
a "synthesize installed-but-silent rows" pass on the registry.
Two pieces, one PR because they share a deployment surface:
1. systemd. decnet-reconciler.service.j2 mirrors the orchestrator unit
shape (docker group, hardened sandbox, append-logs). Read-only
/var/lib/decnet so it can read decnet-state.json without write
access. Auto-discovered by `decnet init` via the existing
decnet-*.service.j2 glob — no init.py change needed. Added to
decnet.target so `systemctl start decnet.target` brings it up
alongside collector / sniffer / mutator / etc. Also added to the
agent reaper script so self-destruct cleans it up on workers.
2. Bus signal. reconcile_once now publishes
`decky.<host_uuid:name>.state` on every insert / delete /
state-changed transition. Reuses the existing DECKY_STATE topic
family (no bus/topics.py change → no wiki update needed per the
bus-signals doc rule). Composite host_uuid:name segment keeps
fleet rows distinguishable from MazeNET TopologyDecky rows whose
ids are bare UUIDs. Quiet ticks publish nothing — convergence
means silence.
Bus is plumbed through the worker, defaults to None for unit-test
callers. publish_safely keeps the source-of-truth contract: DB write
is authoritative, the publish is best-effort notification.
Captures previous_state into a local before update_fleet_decky_state
runs — a fake repo that mutates rows in-place would otherwise see the
post-update state and report previous == current. Real repos don't
have this concern but the fix is cheap and makes the function less
order-dependent.
Switches _one_tick from list_running_topology_deckies to
list_running_deckies (the union view added in 095500a). Resolves the
permanent "no actionable deckies (running+ssh count=0)" log on hosts
running only unihost MACVLAN / IPVLAN decoys — the orchestrator now
sees fleet_deckies rows alongside MazeNET topology rows and SWARM
DeckyShard rows.
Also fixes the misleading log message: the old "running+ssh count=N"
reported the *pre-filter* total (count of all running deckies, not
the SSH-eligible subset that scheduler.pick actually evaluates). New
line breaks down running, ssh_eligible, and per-source counts so
debugging "why isn't it picking?" no longer requires reading
scheduler internals.
Regression test: orchestrator integration suite now seeds fleet_deckies
rows (not just topology_deckies) and verifies a tick picks them and
records an event with dst="local:fleet-*" — proving the original bug
on the operator's mothership is fixed.
Adds decnet.fleet.reconciler — a pure async function plus a long-lived
worker — that periodically reconciles the three sources of truth on a
DECNET host:
1. decnet-state.json (CLI-canonical fleet record)
2. fleet_deckies table (DB mirror, written by engine.deployer)
3. docker inspect (actual per-container runtime state)
Drift handling:
* JSON has X, DB doesn't → INSERT (deploy ran with DB offline)
* DB has X (this host), JSON doesn't → DELETE (teardown ran with DB offline)
* Both have X, docker disagrees → flip state to running/failed/degraded
* Docker socket unreachable → leave existing state alone (don't
torch every row to torn_down)
Cross-host safety: deletions are scoped to host_uuid for the local host;
a master that runs both a local fleet and swarm workers will never
clobber a peer's slice.
CLI:
decnet reconcile --once # one-shot, prints counts
decnet reconcile [--interval N] # long-lived worker, mirrors
# orchestrator's lifecycle (control
# listener + heartbeat + tick loop)
Promotes decnet/fleet.py → decnet/fleet/ package so the reconciler can
live alongside it without name collision (build_deckies_from_ini and
all_service_names re-exported unchanged via __init__.py).
14 new tests cover state aggregation rules, all four drift directions,
host_uuid scoping, docker-unreachable safety, and worker shutdown via
the bus control event.
The unihost API path delegates to engine.deployer.deploy(), which now
writes both decnet-state.json (existing) and the fleet_deckies DB
table (added in 646aeec). Comment makes the single-sink design
explicit so future maintainers don't add a parallel save_state /
upsert_fleet_decky call here.
No behavioral change — every fleet-creation path on every host (CLI
deploy, this unihost API path, and per-worker SWARM agent deploys)
already routes through the engine.deployer single sink.
CLI deploy now writes both surfaces: decnet-state.json (existing,
canonical for offline / no-API hosts) and the new fleet_deckies DB
table (visible to orchestrator, web dashboard, REST API).
Best-effort: a DB outage logs a warning and returns. The JSON file
remains the source of truth for `decnet status`, `decnet teardown`,
sniffer, and collector — operators on a CLI-only host keep working.
_run_async helper bridges sync deploy() into the async repository.
Always uses a fresh thread because the API handler at
web.router.fleet.api_deploy_deckies invokes deploy() from inside a
FastAPI event loop, which would otherwise break asyncio.run.
Verified end-to-end against MySQL: deploy mirror inserts rows, union
view (list_running_deckies) returns them with source="fleet",
teardown mirror removes them. Works from both sync (CLI) and async
(API handler) call sites.
Adds a fleet_deckies table so DB-only consumers (orchestrator, web
dashboard, REST API) can see unihost / MACVLAN / IPVLAN deckies
without reading the JSON state file. Mirrors DeckyShard field-for-field.
Composite PK (host_uuid, name) future-proofs for a mothership that
runs both a local fleet and acts as a swarm master. host_uuid defaults
to the "local" sentinel — no FK to swarm_hosts because the local
mothership isn't enrolled as a worker.
Repo additions: upsert_fleet_decky, delete_fleet_decky,
list_fleet_deckies, list_running_fleet_deckies,
update_fleet_decky_state, plus list_running_deckies which unions
topology + fleet + shard sources for the orchestrator.
Smoke-tested round-trip against MySQL: upsert, list_running, union
view (source="fleet"), delete.
Mirrors the IP-ID classifier for TCP ISN values: per-source-IP rolling
deque (maxlen=8) populated from each inbound SYN's tcp.seq, classified
on every emission. A 'random' verdict is the modern norm; 'incremental',
'zero', or 'constant' indicates legacy stacks or hand-rolled raw-socket
tooling — a strong fingerprint signal.
Active prober now also captures server_isn (single sample, not classified
in-flight; downstream consumers correlating multi-probe results can apply
seq_class.classify_sequence themselves).
Profiler rollup carries the latest non-'unknown' label into
attacker.tcp_fingerprint. Dedup key already covers isn_class from
the previous commit, so transitions emit cleanly.
UI surfaces ISN class as a colour-coded tag with a ⚠ glyph for
non-random verdicts, since they're the genuinely interesting case.
Adds a per-source-IP rolling sample buffer (deque, maxlen=8) for IP-ID
values seen on attacker SYNs and a stdlib-only classifier in
decnet/sniffer/seq_class.py. Each new SYN appends ip.id and re-classifies
the buffer; the result is logged on tcp_syn_fingerprint events alongside
sample count.
The dedup key now folds in ipid_class so a transition from 'unknown' to
a definitive verdict emits exactly one fresh event instead of being
suppressed by the old (os|options) key. Profiler rollup carries the
latest non-'unknown' label into attacker.tcp_fingerprint.
UI surfaces it as a colour-coded tag in the TCP STACK panel: random
neutral, incremental amber, zero/constant green (the strong signal).
Active prober now reads ip.tos from the SYN-ACK and emits tos/dscp/ecn
alongside the existing TTL/window/options fields. dscp is folded into the
fingerprint hash so different DSCP markings produce distinct signatures.
Passive sniffer logs the same three fields on tcp_syn_fingerprint events;
profiler rollup carries them into the attacker tcp_fingerprint snapshot;
AttackerDetail's TCP STACK panel now surfaces DSCP and ECN cells.
Every 100 ticks, trim per-dst_decky_uuid history down to 10000 rows
(oldest first). Keeps the events table bounded on long-running fleets
without paying the cost on every write.
GET /api/v1/orchestrator/events — paginated list with optional
kind=traffic|file filter. GET /api/v1/orchestrator/events/stream —
SSE: snapshot on connect, live forward of orchestrator.> bus events
mapped to 'traffic' / 'file' SSE event names.
Repo gains list_orchestrator_events(limit, offset, kind?, since_ts?),
count_orchestrator_events(kind?), and prune_orchestrator_events
(per_dst_cap=10000) for periodic worker-side trimming.
Aligns the bus token with the DB column value; OrchestratorEvent.kind
is 'traffic'/'file' but the topic was 'activity'/'file'. The asymmetry
made consumer code (UI filter, SSE event names) need a translation
layer. No external subscribers existed yet.
Adds a new decnet orchestrate worker whose job is to keep the honeypot
ecosystem from looking suspiciously static — a frozen LAN with no
inter-host traffic and no filesystem aging is its own honeypot tell.
MVP scope:
- New OrchestratorEvent table + repo methods (purpose-built sibling
to Log so synthetic events stay separable from attacker-driven ones).
- New orchestrator.{activity,file}.<decky_id> bus topics +
system.orchestrator.health heartbeat.
- SSH-only driver. Traffic action runs python3 inside src container
to TCP-connect dst:22 and read the SSH banner — real on-the-wire
SSH-protocol traffic without shipping creds. File action drops or
refreshes a small file via docker exec on the destination.
- Random scheduler (50/50 traffic/file when >=2 SSH-capable deckies
are running). Diurnal shaping, role-aware pairing, and session-aware
backoff are explicit non-goals for MVP.
- CLI registration, systemd unit (SupplementaryGroups=docker),
worker-registry entry so the dashboard shows orchestrator health.
- 11 tests: scheduler policy, driver argv shape + injection-safety,
end-to-end one-tick integration with FakeBus + SQLite.
API: /api/v1/campaigns (paginated list), /api/v1/campaigns/{uuid}
(soft-merge chain follow), /api/v1/campaigns/{uuid}/identities
(member identities), and /api/v1/campaigns/events (SSE under
campaign.> + JWT-via-?token=, snapshot-on-connect). Mirror of the
identity router; same auth, same shape, same OpenAPI tags pattern.
Frontend: CampaignDetail.tsx page (same visual vocabulary as
IdentityDetail), useCampaignStream hook (mirror of
useIdentityStream), /campaigns/:id route, IdentityDetail's
CAMPAIGN badge becomes clickable and navigates to the campaign.
useIdentityStream now listens for identity.campaign.assigned so
the badge appears live without a manual refresh.
Runs the chained identity + campaign clustering pipeline against all
seven fixtures via from_synthetic / from_synthetic_identity adapters
and ratchets every YAML floor to 1.0 — the production clusterer
(and the reference clusterers used in the per-fixture tests) all
score perfectly across ARI / homogeneity / completeness /
singleton_recall on each fixture.
Three substrate fixes surfaced by the ratchet:
- Tuning: shared_infra now Jaccards payload+C2 only; decky_set moved
into cohort_weight to prevent fleet-scarcity false-merges (F1's
shared_wordlist failure mode). Tier weight raised to 1.0 so
shared payload+C2 alone crosses threshold (F5's intended pass).
- Adapter: from_synthetic_identity now reads SyntheticSession
started_at + duration_s for session_windows and per-decky
timestamps (the production-row adapter still uses start_ts/end_ts
when available).
- Fixture data: paused_campaign.yaml's JA3 collided exactly with
vpn_hopping.yaml's (same TLS extension list). The collision
fused two unrelated campaigns under the chained identity layer
in the noise_floor composite. Made paused's JA3 distinct.
Also wires Campaign / CampaignsResponse into models/__init__.py's
__all__ that was missed in the schema commit.
The campaign clusterer worker mirrors the identity-side worker shell
(bus connect, heartbeat, control listener, slow-tick fallback) but
wakes on identity.> instead of attacker.> — campaign-level work is
gated on identity-layer changes, not raw observations.
The connected-components implementation reads identities via
list_identities_for_clustering, projects them with from_identity_row,
runs union-find over combined_campaign_weight, writes campaigns rows,
sets attacker_identities.campaign_id, and runs the same revocable-
merge pass as the identity layer (a merged-out campaign whose
identities no longer co-cluster with the winner gets revoked).
Bus: adds campaign.> family (formed / identity.assigned / merged /
unmerged) plus the cross-family identity.campaign.assigned so
existing identity-stream subscribers see the badge update without
having to subscribe to campaign.>. Wiki Service-Bus.md updated in
wiki-checkout in the same wave per the project's bus-signals
discipline.
CLI: decnet campaign-clusterer registered as master-only via
MASTER_ONLY_COMMANDS; --poll-interval / --daemon mirror the identity
clusterer command surface.
The signal taxonomy for the campaign clusterer (next commit). Mirror
of the identity-layer module but with edge families that don't
translate 1:1: phase-handoff (load-bearing for F5 multi_operator —
the signal the identity-side fingerprint-disagreement veto deliberately
isn't), shared-infra (vetoed at identity level, primary positive
signal here), temporal-overlap (pairwise-relative — F7 invariance
preserved), cohort (weak supporting weight only).
Tier weights tuned so phase-handoff alone crosses threshold (F5),
shared-infra + temporal-overlap together cross (canonical co-op
pattern), and shared-infra + cohort together do NOT (F1
shared_wordlist's failure mode). The F7 time-shift invariant is
explicitly tested on every time-bearing edge and on the combined
weight.
Adds the campaigns table and the BaseRepository / SQLModelRepository
methods that the campaign-clusterer worker (next commit) needs to
populate it. Mirrors the AttackerIdentity layer: schema_version from
day one for federation gossip, soft-merge via merged_into_uuid with a
chain-walking get_campaign_by_uuid, list_campaigns excluding merged-
out rows while list_all_campaigns returns the unfiltered set for the
revoke pass. attacker_identities.campaign_id gets a real FK now that
the target table exists.
Mirrors GET /api/v1/topologies/{id}/events: subscribes to identity.>
on the bus for the duration of the request and forwards each event as
a named SSE frame (formed / observation.linked / merged / unmerged).
The endpoint is broadly scoped (every identity event, not per-uuid)
because both AttackerDetail and IdentityDetail need the same
firehose: AttackerDetail watches for an identity.formed that finally
binds its identity_id; IdentityDetail watches for
observation.linked / merged / unmerged against its current row. A
per-uuid filter would force the client to know its identity before
subscribing, which it doesn't always.
JWT via ?token= (EventSource can't set headers), require_stream_viewer
gate, sse_connection_slot per-user cap, snapshot-on-connect with
the first 50 identities so the client buffer renders without a
separate REST call.
Bus-disabled / unreachable path keeps the connection alive on
keepalives so the client doesn't reconnect-storm; it can re-poll
the REST API on its own timer.
Reworks the clusterer's tick to handle multi-identity components and
re-evaluate prior merges. Two passes per tick:
Pass 1 — per-component reconciliation:
* Fresh component → mint identity (commit 4 path).
* Single-identity component → link unassigned observations.
* Multi-identity component → soft-merge: pick the smallest-uuid
winner deterministically, set merged_into_uuid on each loser,
link unassigned observations to the winner. Observations stay
FK'd to their original identity row — the merge is a soft
pointer, not a re-point. Audit trail preserved; cached
subscribers resolve through the chain.
Pass 2 — revocable-merge undo:
* For each merged-out identity, check whether its observations
still cluster with its winner's. If not, the merge is
contradicted by new evidence — clear merged_into_uuid and emit
identities_unmerged. The resurrected identity keeps its original
uuid, so subscribers that cached it during the merged interval
re-attach without a new lookup.
A pre-built merge-chain dict feeds Pass 1 so the effective-identity
lookup is O(1) per observation. The chain has a hop cap (paranoia
against accidental cycles in the underlying state).
Repo additions on BaseRepository + SQLModelRepository:
* list_all_identities() — includes merged-out rows.
* update_identity_merged_into(uuid, winner_or_None) — single
setter for both merge and unmerge.
DummyRepo coverage stub updated.
Tests:
* Two distinct identities bridged by a new observation merge with
the smaller uuid as winner.
* A pre-seeded soft-merge whose underlying observations diverge
gets revoked; resurrected uuid emerges with merged_into_uuid
cleared.
* Tick is idempotent under no state changes.
Two operators cooperating on one campaign can share C2 endpoints +
stage-1 payloads while running distinct tooling — fixture 5
(multi_operator) is the canonical demonstration. The identity
clusterer must NOT fuse them: shared infra is a campaign-level
signal, not an identity-level one. The campaign clusterer (downstream
work) handles that grouping over identities.
Mechanism: when two observations have non-null fingerprints AND the
fingerprints fully disagree, the high-weight tier drops the payload
and C2 contributions to zero. JA3 / HASSH agreement still returns
1.0 directly — no veto applies when something agrees. Partial
agreement (one slot agrees, another disagrees) is treated as
agreement, since stable-tool partial overlap is more consistent
with one identity than two.
The veto only triggers when there is actual disagreement evidence —
two un-fingerprinted observations sharing a C2 still cluster, since
the absence of fingerprints is not the same as disagreement on them.
Fixture 5 production-clusterer assertion added at identity level:
ARI = 1.0, homogeneity = 1.0, exactly 2 predicted clusters from
2 truth identities. Phase-handoff edges (from the TODO) belong to
the downstream campaign clusterer, not this identity clusterer.
The clusterer now drops a single high-tier function call in favor of
a tier-weighted sum. Tier multipliers (high=1.0, medium=0.6, low=0.2,
very_low=0.05) are tuned so the threshold (1.0) admits high-tier
agreement alone while leaving every weaker tier — and every
combination of weaker tiers — under threshold.
Per-tier discipline tested:
- high alone clusters
- medium alone does NOT cluster (supporting signal only)
- low alone does NOT cluster (fixture 1's failure mode)
- very-low alone does NOT cluster (fixture 2's failure mode)
- all three weak tiers stacked still don't reach threshold
- high + medium clusters (high already saturates)
The combination is forward-compatible: low + very-low contributions
are computed today but always project to 0.0 because the production
adapter doesn't populate credentials / ASN-edge inputs into the
fixture path yet. Their contribution becomes load-bearing in commit 7
when the low-tier landing tightens the F1 / F2 bounds.
Fixture 4 (paused_campaign) ratchet added: high-tier signal carries
the multi-day-silence campaign into one identity. Time-agnostic
invariant — silence is irrelevant to the edge weight.
The connected-components clusterer now writes attacker_identities
rows + sets attackers.identity_id when high-weight signals (JA3 /
HASSH / payload-hash / C2-endpoint exact match) agree across
observations. Singletons stay un-fingerprinted and un-clustered.
Algorithm split:
- cluster_observations(observations) — pure union-find over the
high-weight edge function. Same code path for fixture validation
and production tick.
- from_attacker_row(row) — production-row adapter; recovers JA3 +
HASSH from Attacker.fingerprints JSON. Payload + C2 join from
logs in later commits; the function shape doesn't change.
Repo additions on BaseRepository + SQLModelRepository:
- list_attackers_for_clustering(limit=None)
- create_attacker_identity(row)
- set_attacker_identity_id(attacker_uuid, identity_uuid)
DummyRepo coverage stub updated.
v1 behavior is conservative: only assigns identities to observations
whose identity_id is currently NULL. Multi-identity components are
skipped this pass — merge / re-assign lands in commit 10 with
revocable merges.
Fixture bounds tightened against the production clusterer:
- lone_wolf (F3) — singletons stay singletons
- shared_wordlist (F1) — credential-only overlap doesn't cluster
(high-weight tier doesn't include credentials)
- vpn_hopping (F2, identity-level) — 5 rotated IPs with stable JA3
+ HASSH fold into one identity, ARI = 1.0, completeness = 1.0
Adds the four weight-tier edge functions as pure, time-agnostic
scoring primitives over an Observation projection. Each returns a
score in [0, 1]; the connected-components impl will combine + threshold
in subsequent commits.
Tier semantics (from IDENTITY_RESOLUTION.md):
- high — JA3/HASSH/payload-hash/C2-endpoint exact match
- medium — phase-bucketed command-sequence Jaccard
- low — credential-attempt-set Jaccard (defeated alone by F1)
- very low — ASN equality (defeated alone by F2)
Time-agnostic invariant is a static test: Observation has no time
fields, so no edge function can silently start using them. Fixture 7
forbids recency-decay clustering on multi-month APT campaigns.
A from_synthetic() adapter projects SyntheticAttacker corpora into
Observation; the production-row adapter lands when the clusterer
starts reading the attackers table.
Revocable merges (a contradiction-driven undo of identity.merged) ship
in the clusterer work; this reserves the topic up-front so identity.>
subscribers receive it day one without a re-subscribe.
The clusterer worker's ClusterResult fan-out now publishes on
identity.unmerged when populated. The skeleton clusterer never
populates it; the revocable-merge commit will.
Wiki update lives in wiki-checkout/Service-Bus.md (separate repo).
Adds the decnet clusterer master-only command + provider-subpackage
shape (base.py + factory.py + impl/connected_components.py) so
subsequent commits can land similarity-graph features without
churning callers.
The skeleton ConnectedComponentsClusterer.tick is a no-op; the
worker shell is fully wired (bus consumer on attacker.observed +
attacker.scored, slow-tick fallback, health heartbeat, control
listener, ClusterResult fan-out to identity.formed/observation.linked
/merged). Subscribers on identity.> see no events from this clusterer
until edge functions land, but the lifecycle is in place.
Fourth of the five-step identity-resolution substrate. Constants and
builder ship now; no publishers exist yet — they land with the
clusterer worker. Subscribers (webhook worker, dashboard SSE relay)
can register against identity.> from day one.
* decnet/bus/topics.py — IDENTITY root + IDENTITY_FORMED /
IDENTITY_OBSERVATION_LINKED / IDENTITY_MERGED leaves; identity()
builder mirroring the attacker() / system() helpers. Module
docstring topic-tree updated.
* tests/bus/test_topics.py — assert builder produces the expected
three topic strings + rejects empty event_type.
Wiki Service-Bus.md and a new Identity-Resolution.md page land in the
companion wiki-checkout commit.
Second of the five-step identity-resolution substrate. Ships the API
surface against the empty AttackerIdentity table from commit 1 — every
endpoint returns empty/404 cleanly until the clusterer populates rows.
Routes (auth-gated, viewer role):
* GET /api/v1/identities — paginated list, excludes merged-out rows
* GET /api/v1/identities/{uuid} — detail; transparently follows
merged_into_uuid to surface the canonical winner
* GET /api/v1/identities/{uuid}/observations — Attacker rows FK'd
to the (resolved) identity uuid
Repository (BaseRepository abstract + SQLModelRepository concrete):
* get_identity_by_uuid (with merge-chain following, hop-bounded)
* list_identities / count_identities (excluding merged-out)
* list_observations_for_identity / count_observations_for_identity
Tests: 12 new (empty-table behavior, seeded data, merge-chain
resolution, repo-level smoke against real SQLite). Also fixes the
pre-existing test_base_repo_coverage failure (DEBT-041 added abstract
methods without updating the DummyRepo stub) — included here because
this PR adds 5 more abstract methods, fixing it as a bonus.
474 db/web/profiler/correlation tests green.
Schema-only commit, first of the five-step substrate for identity
resolution. The clusterer that populates identities lands later; this
ships the table empty and the FK uniformly NULL on existing rows.
* decnet/web/db/models/attackers.py — new AttackerIdentity SQLModel
(uuid PK, schema_version, fingerprint summary lists, kd_digraph_simhash,
merged_into_uuid self-FK, all clusterer-populated fields nullable).
Attacker grows a nullable indexed identity_id FK + docstring marking
it as the per-IP observation row.
* decnet/web/db/models/__init__.py — re-exports AttackerIdentity.
* tests/db/test_identity_schema.py — 9 schema invariants: table exists,
identity_id nullable + indexed, FK targets attacker_identities.uuid,
schema_version defaults to 1, attacker rows inserted with NULL
identity_id, FK constraint blocks orphans.
463 unrelated db/web/profiler/correlation tests still green. See
development/IDENTITY_RESOLUTION.md for the full design.
Pre-implementation scaffolding for campaign clustering. The simulator is
the spec — algorithm code follows once fixtures + metrics are stable.
* decnet/clustering/ukc.py — UKCPhase enum (19 phases across In/Through/Out
stages), OBSERVABLE_PHASES set, stage_of() helper. Vocabulary aligns
with future MITRE ATT&CK tagging so synthetic data and runtime phase
inference don't need renaming when TTP-tagging lands.
* tests/factories/campaign_factory.py — YAML DSL parser + deterministic
generator emitting truth-labeled SyntheticAttacker / SyntheticSession
records. Validates phase names, warns on unobservable phases, supports
multi-campaign + noise corpora.
* tests/clustering/metrics.py — pure-Python ARI / homogeneity /
completeness / singleton_recall (no sklearn dep). Decided before any
algorithm exists, on purpose.
* tests/fixtures/campaigns/lone_wolf.{yaml,expected.yaml} — fixture 3
from the design doc; simplest of the six, exercises the full pipeline
with an identity-clusterer placeholder.
* development/CAMPAIGN_CLUSTERING.md — design spec for the feature.
* development/DEVELOPMENT_V2.md — note on DSL evolution path
(concurrent phases, multi-actor per phase) deferred post-v1.