Commit Graph

475 Commits

Author SHA1 Message Date
34c85346a6 feat(deploy): seed canary baseline at deploy time + tests
Hooks decnet.canary.planter.seed_baseline into the deploy() flow's
fleet-mirror step. After upserting a FleetDecky as 'running' we seed
the configured baseline canary set on the freshly-deployed decky.

Persona detection: read d.nmap_os (Windows -> windows path-mapping,
otherwise linux). Failures are logged and surface as state=failed
rows in the UI; the deploy itself MUST NOT abort (resilience
principle in CLAUDE.md).

Tests confirm:
- seed_baseline produces one row per configured generator per decky;
- the deployer source wires seed_baseline inside a try/except so a
  failure can't abort the deploy.
2026-04-27 13:19:08 -04:00
6c4ea706f8 feat(api): canary token CRUD router (/api/v1/canary) + tests
Two sub-routers under /api/v1/canary:

blobs (operator-uploaded artifacts, deduped by sha256):
- POST   /blobs          (multipart upload; admin)
- GET    /blobs          (list with token_count; admin)
- DELETE /blobs/{uuid}   (refcount-aware; 409 when referenced; admin)

tokens (per-decky planted artifacts):
- POST   /tokens                          (generate or instrument + plant; admin)
- GET    /tokens?decky_name=&kind=&state= (filter; viewer)
- GET    /tokens/{uuid}                   (detail; viewer)
- GET    /tokens/{uuid}/preview           (instrumented bytes; admin)
- GET    /tokens/{uuid}/triggers          (paged callback log; viewer)
- DELETE /tokens/{uuid}                   (revoke + bus event; admin)

XOR validation: exactly one of blob_uuid / generator must be set.
Path validation rejects relative/NUL/newlines/.. segments. Every
body-bearing route documents 400 plus 401/403/404 as applicable.

Stdlib MIME sniffer (no python-magic dep) covers PNG/JPEG/GIF/PDF/
HTML/XML/DOCX/XLSX/JSON/YAML/TOML/text/plain; everything else falls
through to passthrough.

Tests run end-to-end through the live FastAPI app (planter docker
exec is patched); 17 cases covering dedup, refcount, lifecycle,
XOR validation, path validation, and 404 paths.
2026-04-27 13:18:00 -04:00
f9513bb7dd feat(cli): register decnet canary subcommand + tests
decnet canary launches the HTTP + DNS callback receiver via
decnet.canary.worker.run. Mirrors the shape of decnet webhook
(typer command with --daemon flag, asyncio.run in the foreground).

Deliberately NOT added to MASTER_ONLY_COMMANDS — every host that
hosts deckies runs its own canary worker, and the bus events stay
local to that host (per-host webhook fanout handles SIEM egress).
2026-04-27 13:13:23 -04:00
fae3e0caa3 feat(canary): worker (HTTP + stdlib DNS callback receivers) + tests
decnet canary worker hosts both callback surfaces in one process:

- HTTP: a tiny FastAPI app on its own port (default 8088). The only
  meaningful route is GET /c/{slug} which looks up the slug, persists
  a CanaryTrigger, publishes canary.<id>.triggered, and returns a 1x1
  transparent GIF. Unknown slugs return the same response (stealth);
  no decnet strings leak in headers/banners; docs/openapi/redoc are
  disabled. X-Forwarded-For is honored.

- DNS: an authoritative UDP server for *.<canary_zone> using
  asyncio.DatagramProtocol with stdlib-only DNS wire-format parsing
  (no dnslib dep). Same lookup -> persist -> publish flow, plus a
  sinkhole A record (192.0.2.1) so the attacker's resolver doesn't
  loop on NXDOMAIN. Single-label slugs only; multi-label probes
  return NXDOMAIN. Pointer loops in malformed queries are caught
  (10-hop cap) so an adversarial packet can't wedge the parser.

Tests cover both surfaces without privileged sockets:
- HTTP via Starlette TestClient: known/unknown slug, headers, XFF,
  stealth-string assertions.
- DNS via direct DatagramProtocol drive: known slug -> ANSWER,
  unknown -> NXDOMAIN, pointer-loop -> ValueError, malformed
  packet -> silent drop.
2026-04-27 13:12:05 -04:00
8fb9bc5545 feat(canary): planter (docker exec injector) + tests
Plant / revoke / seed_baseline using the same docker-exec-with-sh-c
pattern proven by decnet/orchestrator/drivers/ssh.py:_run_file.

Each plant call composes a single sh script:
  mkdir -p <dirname> && printf %s <base64> | base64 -d > <path> &&
  chmod <mode> <path> && touch -d @<mtime> <path>

Base64-on-the-host / decode-in-the-container keeps binary artifacts
(DOCX/PDF/PNG) safe across the argv boundary; the placement_path,
mode, and mtime are shlex-quoted.

State transitions hit the repo: planted -> failed on docker error
with stderr captured into last_error. Bus events fire on success
(canary.<id>.placed) and on revoke (canary.<id>.revoked) — wrapped
in try/except so a downed bus never blocks a placement.

seed_baseline(decky_name, repo) is the deploy-hook entry point —
reads DECNET_CANARY_BASELINE (default git_config,env_file,honeydoc,
aws_creds), persists one row per generator, plants each. Failed
placements are logged but do NOT abort; the deployer hook treats
the return list as informational.
2026-04-27 13:08:18 -04:00
19ceff4417 feat(canary): operator-upload instrumenters + tests
Seven instrumenters that mutate operator-supplied artifacts to
embed the callback URL:

- passthrough — bytes unchanged; only DNS-callback tokens trip
  detection, with the slug embedded in the placement path
- plain      — substitutes {{CANARY_URL}}/{{CANARY_HOST}} placeholders;
  falls back to appending a comment line whose prefix adapts to the
  apparent file syntax (#, //, ;)
- html       — injects a 1x1 tracking pixel before </body>, appends
  if the close tag is missing
- docx       — direct zipfile manipulation (no python-docx dep):
  inserts an external-image Relationship into word/_rels/document.xml.rels
  and a matching <w:drawing> element before </w:body>
- xlsx       — sibling of docx; injects an external-image relationship
  into xl/_rels/workbook.xml.rels (orphan rels are still fetched on
  open by most viewers)
- pdf        — uses pikepdf to install /OpenAction /URI on the catalog;
  rejects with a clear message when pikepdf isn't installed
- image      — uses Pillow to embed slug + URL in PNG tEXt / JPEG
  comment; rejects with a clear message when Pillow isn't installed

DOCX and XLSX share the rId allocator + relationship injector via
the docx module; both work on stdlib zipfile only.

Tests synthesise minimal real DOCX/XLSX fixtures inline, round-trip
each instrumenter, and assert the callback URL ends up in the
mutated bytes while the file still parses.
2026-04-27 13:03:42 -04:00
c7658ea65e feat(canary): synthesised-artifact generators + tests
Five built-in generators that produce deterministic fake artifacts
keyed by the token slug:

- aws_creds  — passive [default]/[prod] credentials block, no
               callback wiring (AWS-key tokens require an external
               trap, which is post-v1)
- git_config — .git/config with origin url = http_base/c/<slug>/repo.git
- env_file   — .env with API_BASE_URL + WEBHOOK_NOTIFY_URL embedding
               the callback URL plus inert realism filler
- ssh_key    — PEM-shaped fake private key whose host comment carries
               <slug>.<dns_zone> when DNS is deployed, else the
               http_base host
- honeydoc   — minimal HTML report with a 1x1 tracking-pixel <img>
               whose src is the callback URL; fallback for the
               deploy-time baseline before the operator uploads a
               real DOCX/PDF

Tests assert byte-stability (same ctx -> same bytes), slug presence
in the embedded fields, that aws_creds is intentionally URL-free,
and that every artifact carries operator-facing notes for the
preview endpoint.
2026-04-27 12:59:19 -04:00
8f19adecfe feat(canary): package scaffolding (base/factory/paths/storage) + tests
Mirrors the decnet.intel layout (base + factory + lazy concrete
imports). Defines:

- CanaryArtifact / CanaryContext dataclasses + the generator and
  instrumenter ABCs they share
- factory dispatch for generators (git_config/env_file/ssh_key/
  aws_creds/honeydoc) and instrumenters (docx/xlsx/pdf/html/image/
  plain/passthrough), plus pick_instrumenter_for_mime() for MIME-driven
  dispatch on operator uploads
- persona-aware default placement paths (Linux vs. Windows-shaped)
  and absolute-path validation that the API will use to validate
  operator-supplied placement_path values
- on-disk blob store: sha256-keyed two-level fan-out, idempotent
  writes, refcount-aware unlink (the DB row is the source of truth)

Also covers prior commits' tests (bus topics, models, repo CRUD)
under tests/canary/. 79 tests, all pass.
2026-04-27 12:56:01 -04:00
6a0d140e91 feat(db): canary token repository CRUD
Adds the abstract surface on BaseRepository and the SQLModel-backed
implementation (shared by SQLite and MySQL) for:

- canary blobs (upsert-by-sha256, list-with-refcount, refcount-aware delete)
- canary tokens (create, slug lookup, list with filters, state update)
- canary triggers (record+bump-counters atomically, list, attribute)

The triggers path is a single session that inserts the row and bumps the
parent token's counters together, so a subscriber that reads the token
right after the bus event sees the updated count. Blob delete refuses
while any token (including revoked) still references the blob; pre-v1
revoked tokens stick around for forensic value.
2026-04-27 12:48:24 -04:00
813f14bf2a feat(db): canary token tables (blob/token/trigger)
Three new tables for the canary tokens feature:

- canary_blobs       — operator-uploaded source artifacts, deduped by sha256
- canary_tokens      — one planted artifact in one decky; carries the
                       callback slug, generator/instrumenter, and lifecycle
- canary_triggers    — append-only log of every callback hit; attacker_id
                       back-filled by the correlator

Pydantic request/response shapes live in the same file per the
single-source-of-truth convention. No migrations file — pre-v1
SQLModel.metadata.create_all() covers it.
2026-04-27 12:45:41 -04:00
914c911984 feat(bus): canary token bus topics (placed/triggered/revoked)
Reserved topic family for the upcoming canary-tokens feature so the
correlator and webhook fanout can subscribe to canary.> from day one.
No producers yet; planter, decnet canary worker, and API will publish
in subsequent commits.
2026-04-27 12:43:23 -04:00
828165783e feat(templates): standalone NTLMSSP Type 3 parser + decnet-init wrapper
* decnet/templates/{rdp,smb}/ntlmssp.py — minimal Type 3 (Authenticate)
  parser shared between the SMB and RDP-NLA templates.  Lands NTLM
  creds in the universal Credential table with secret_kind=ntlmssp_v1
  / ntlmssp_v2 and secret_b64 = base64 of the NtChallengeResponse so
  the bounty pipeline can feed the right hashcat mode.
* scripts/decnet-init.sh — convenience wrapper around `sudo decnet init
  --force` that targets the current working directory; saves operators
  retyping the install paths during dev iterations.
2026-04-27 10:12:30 -04:00
f046634d6e feat(web): Persona Generation page under AUTOMATION
New dashboard surface for editing the global emailgen persona pool —
the JSON file fleet (MACVLAN/IPVLAN) and SWARM-shard mail deckies pull
from.  MazeNET topology personas are out of scope here; they're
configured per-topology in the topology editor.

Backend:
* GET/PUT /api/v1/emailgen/personas — admin-write, viewer-read.  PUT
  validates with the same Pydantic schema the worker uses
  (parse_personas), drops invalid entries with a warning, returns 400
  only when the entire payload fails.  Path is operator-discoverable
  on every response so a CLI-driven backup workflow stays visible.

Frontend:
* PersonaGeneration.tsx + .css — table + add/edit modal with the full
  EmailPersona schema (name, email, role, tone, mannerisms list,
  language, signature, active hours, reply latency, uses_llms_heavily).
  Local edits are batched; explicit "SAVE CHANGES" writes back, with a
  dirty-indicator pill and a "DISCARD" reset.  Email uniqueness is
  enforced client-side so the scheduler never picks the same persona
  as both sender + recipient.
* Sidebar AUTOMATION group gains a "Persona Generation" entry next to
  Orchestrator; route registered at /persona-generation.

The worker reads the same on-disk file the API writes — see
decnet.orchestrator.emailgen.global_pool.  The API resets the
in-process cache on every read/write so the worker picks up dashboard
edits within its next tick rather than waiting on mtime.
2026-04-27 09:55:42 -04:00
818aebadfc feat(web): emailgen events in Orchestrator page
The SSE pipe at /orchestrator/events/stream was already streaming
'orchestrator.email.{decky_uuid}' events (the subscription is for the
'orchestrator.>' wildcard), but the consumer side dropped them on the
floor.  Three fixes to close the loop:

* useOrchestratorStream.ts now registers an 'email' SSE listener — the
  EventSource silently ignores frames whose event name has no listener,
  so missing this entry meant every email frame was dropped before
  reaching the page's onEvent handler.

* /api/v1/orchestrator/events accepts kind=email and dispatches to
  list_orchestrator_emails, adapting rows to the existing wire shape:
  subject -> action, sender_email -> src_decky_uuid, recipient_email
  -> dst_decky_uuid, plus email-specific extras (thread_id, language,
  mail_decky_uuid, message_id, in_reply_to) ride along as top-level
  keys.

* Orchestrator.tsx gains an 'email' tab in the kind filter and a
  branch in the row renderer / inspector that:
   - shows full sender / recipient (no UUID truncation),
   - chips the language code next to the subject,
   - relabels ACTION as SUBJECT in the inspector and surfaces
     thread / in-reply-to / mail-decky details.

The 'all' tab continues to show traffic+file only (today's behavior);
operators see emails by switching to the email tab.  A union view at
the API layer is the obvious follow-up but not necessary for now.
2026-04-26 22:56:48 -04:00
73692b52f0 feat(emailgen): gate as master-only
Two-layer gating per CLAUDE.md:
- registration-time: emailgen added to MASTER_ONLY_GROUPS so agents
  don't see the sub-app in 'decnet --help' at all.
- body-guard: _require_master_mode('emailgen ...') at the top of every
  sub-command body so a direct callable import (third-party tooling)
  still bails on agent hosts.

Matches the convention used for 'swarm', 'topology', 'geoip'.  SWARM
agents push their generated mail through the master's emailgen worker
(or none at all); cross-agent emailgen federation stays out of scope.
2026-04-26 22:45:59 -04:00
6d520eaa6f refactor(emailgen): pluggable LLM backend (base/factory/impl)
Lift the Ollama subprocess shell-out out of EmailDriver and into a
proper provider subpackage shape:

  decnet/orchestrator/emailgen/llm/
    base.py        — LLMBackend Protocol + LLMResult + LLMTimeout
    factory.py     — get_llm() reads DECNET_EMAILGEN_LLM
    impl/ollama.py — current 'ollama run' subprocess path
    impl/fake.py   — canned-output backend used by tests

Driver now takes an LLMBackend on construction (or inherits the
factory default).  Tests inject FakeBackend instead of monkeypatching
the subprocess layer, which is cleaner and ~10x faster.  Swapping
Ollama for the Anthropic API / vLLM / llama.cpp is now a third branch
in factory.py; no driver rewrite needed.

Mirrors the convention used by decnet.web.db.factory + decnet.bus.factory
per the provider-subpackages-from-day-one rule in memory.
2026-04-26 22:43:36 -04:00
4badc75fb2 feat(emailgen): global persona pool + Date-stamped EML mtimes
Two changes that unwind earlier MazeNET-only assumptions and fix a
realism tell:

1. Persona resolution is now per-decky-source, not topology-only.  The
   scheduler walks the union view (list_running_deckies, including
   fleet MACVLAN/IPVLAN + SWARM shards) and picks the right persona
   list for each source:
     * topology decky -> Topology.email_personas (per-topology richness
       preserved)
     * fleet / shard  -> a single host-wide pool loaded from disk
       (DECNET_EMAILGEN_PERSONAS, /etc/decnet/email_personas.json, or
       ~/.decnet/email_personas.json)
   Operators install the global pool via 'decnet emailgen
   import-personas <file>' which validates with the same Pydantic
   schema the worker uses.

2. The driver now runs 'touch -d <Date>' inside the docker exec right
   after the EML write so file mtime matches the email's RFC 2822
   Date: header.  Without this an attacker 'ls -lt'ing the spool sees
   every email clustered inside the worker's tick window — the
   cluster itself was a stylometric tell.

CLI now exposes 'decnet emailgen' as a sub-app with 'run' (default,
backwards-compatible with bare 'decnet emailgen') and 'import-personas'.
list_running_deckies carries topology_id through so consumers can resolve
the parent topology without a second round-trip.
2026-04-26 22:39:16 -04:00
2979997442 feat(templates): IMAP/POP3 servers read EML spool from emailgen
When IMAP_EMAIL_SEED / POP3_EMAIL_SEED points at a directory of .eml
files (the orchestrator emailgen worker's drop path,
/var/spool/decnet-emails/ by convention), the bait mailbox is replaced
with those LLM-generated, persona-driven, threaded messages.  Empty /
missing dir keeps the hardcoded fallback so a fresh deployment is never
silent.  Cached with mtime invalidation + a short TTL so a hot mailbox
doesn't pay the parse cost on every IMAP/POP3 command.

Replaces the DEBT-026 stub on both templates that named the env var but
never wired it through.
2026-04-26 22:21:01 -04:00
3ee55ec341 feat(emailgen): Ollama-driven fake email worker for IMAP/POP3 deckies
Second orchestrator worker (decnet emailgen) that drips persona-driven,
threaded, multi-language fake emails into running mail deckies.  Personas
live on Topology.email_personas; topology-wide language_default falls
through to any persona that doesn't pin its own.  Em-dashes are
suppressed at the prompt layer by default and only lifted for personas
explicitly marked uses_llms_heavily — em-dashes are an LLM tell and a
flat corpus of em-dashed mail is a giveaway.

EML delivery writes into /var/spool/decnet-emails/<thread>/<msg>.eml on
the mail decky via docker exec; wiring the IMAP/POP3 templates to read
from that spool (replacing the hardcoded _BAIT_EMAILS) is the next step.
2026-04-26 22:16:19 -04:00
9650366d34 fix(orchestrator): drop topology_deckies FK on event src/dst columns
Once the orchestrator started seeing fleet + SWARM shard sources via
list_running_deckies (a844148), every event row landing on a fleet decky
broke the FK to topology_deckies — the column now carries opaque ids
("local:omega-decky" for fleet, "host_uuid:decky_name" for shards) that
will never match topology_deckies.uuid.

Symptom on the operator's mothership:
  IntegrityError 1452 — orchestrator_events_ibfk_2 FK violated on every
  tick once the reconciler populated fleet_deckies.

Index on dst_decky_uuid is preserved (the dashboard reads
"events for this decky" frequently); only the FK is removed.  Keeps
data integrity loose by design — events are append-only history that
should outlive the deckies they reference.

Existing MySQL deployments need the FK dropped manually:
  ALTER TABLE orchestrator_events
    DROP FOREIGN KEY orchestrator_events_ibfk_2,
    DROP FOREIGN KEY orchestrator_events_ibfk_1;

SQLite users are unaffected — SQLite doesn't enforce FKs by default.
2026-04-26 21:40:06 -04:00
c3518e3159 feat(workers): surface clusterer, campaign-clusterer, reconciler in panel
The Workers panel (Config → Workers tab) hardcodes its row list in
KNOWN_WORKERS — by design, so a rogue publisher can't inject UI rows.
Three heartbeat-emitting workers were missing:

  * clusterer            — behavioral clustering (decnet/clustering/)
  * campaign-clusterer   — campaign assembly  (decnet/clustering/campaign/)
  * reconciler           — host-local fleet convergence (added in 430262e)

Each already publishes on system.<name>.health via run_health_heartbeat,
so they show up live the moment they're added to the registry — no
frontend or subscriber wiring needed (Config.tsx renders whatever
/workers returns).

Also added to _PREFERRED_ORDER in start-all so START ALL WORKERS brings
them up in dependency-friendly order: data-plane → reconciler → intel
→ clustering → output → orchestrator.

Three deployable units (listener, web, swarmctl) intentionally remain
absent from KNOWN_WORKERS — they don't emit heartbeats (CLI / static
server / one-shot tooling), so they'd permanently render as UNKNOWN
and confuse operators.  Adding them is a separate decision that needs
a "synthesize installed-but-silent rows" pass on the registry.
2026-04-26 21:31:34 -04:00
430262e01a feat(fleet): systemd unit + bus signal for fleet reconciler
Two pieces, one PR because they share a deployment surface:

1. systemd. decnet-reconciler.service.j2 mirrors the orchestrator unit
   shape (docker group, hardened sandbox, append-logs).  Read-only
   /var/lib/decnet so it can read decnet-state.json without write
   access.  Auto-discovered by `decnet init` via the existing
   decnet-*.service.j2 glob — no init.py change needed.  Added to
   decnet.target so `systemctl start decnet.target` brings it up
   alongside collector / sniffer / mutator / etc.  Also added to the
   agent reaper script so self-destruct cleans it up on workers.

2. Bus signal. reconcile_once now publishes
   `decky.<host_uuid:name>.state` on every insert / delete /
   state-changed transition.  Reuses the existing DECKY_STATE topic
   family (no bus/topics.py change → no wiki update needed per the
   bus-signals doc rule).  Composite host_uuid:name segment keeps
   fleet rows distinguishable from MazeNET TopologyDecky rows whose
   ids are bare UUIDs.  Quiet ticks publish nothing — convergence
   means silence.

Bus is plumbed through the worker, defaults to None for unit-test
callers.  publish_safely keeps the source-of-truth contract: DB write
is authoritative, the publish is best-effort notification.

Captures previous_state into a local before update_fleet_decky_state
runs — a fake repo that mutates rows in-place would otherwise see the
post-update state and report previous == current.  Real repos don't
have this concern but the fix is cheap and makes the function less
order-dependent.
2026-04-26 21:21:36 -04:00
a8441481b5 fix(orchestrator): see fleet + shard deckies, not just topology rows
Switches _one_tick from list_running_topology_deckies to
list_running_deckies (the union view added in 095500a). Resolves the
permanent "no actionable deckies (running+ssh count=0)" log on hosts
running only unihost MACVLAN / IPVLAN decoys — the orchestrator now
sees fleet_deckies rows alongside MazeNET topology rows and SWARM
DeckyShard rows.

Also fixes the misleading log message: the old "running+ssh count=N"
reported the *pre-filter* total (count of all running deckies, not
the SSH-eligible subset that scheduler.pick actually evaluates). New
line breaks down running, ssh_eligible, and per-source counts so
debugging "why isn't it picking?" no longer requires reading
scheduler internals.

Regression test: orchestrator integration suite now seeds fleet_deckies
rows (not just topology_deckies) and verifies a tick picks them and
records an event with dst="local:fleet-*" — proving the original bug
on the operator's mothership is fixed.
2026-04-26 21:16:22 -04:00
f775223a83 feat(fleet): reconciler converges JSON ↔ DB ↔ docker
Adds decnet.fleet.reconciler — a pure async function plus a long-lived
worker — that periodically reconciles the three sources of truth on a
DECNET host:

  1. decnet-state.json (CLI-canonical fleet record)
  2. fleet_deckies table (DB mirror, written by engine.deployer)
  3. docker inspect (actual per-container runtime state)

Drift handling:
  * JSON has X, DB doesn't       → INSERT (deploy ran with DB offline)
  * DB has X (this host), JSON doesn't → DELETE (teardown ran with DB offline)
  * Both have X, docker disagrees → flip state to running/failed/degraded
  * Docker socket unreachable    → leave existing state alone (don't
                                    torch every row to torn_down)

Cross-host safety: deletions are scoped to host_uuid for the local host;
a master that runs both a local fleet and swarm workers will never
clobber a peer's slice.

CLI:
  decnet reconcile --once            # one-shot, prints counts
  decnet reconcile [--interval N]    # long-lived worker, mirrors
                                     # orchestrator's lifecycle (control
                                     # listener + heartbeat + tick loop)

Promotes decnet/fleet.py → decnet/fleet/ package so the reconciler can
live alongside it without name collision (build_deckies_from_ini and
all_service_names re-exported unchanged via __init__.py).

14 new tests cover state aggregation rules, all four drift directions,
host_uuid scoping, docker-unreachable safety, and worker shutdown via
the bus control event.
2026-04-26 21:14:48 -04:00
8814902999 docs(api): clarify fleet_deckies + JSON dual-write happens in engine.deployer
The unihost API path delegates to engine.deployer.deploy(), which now
writes both decnet-state.json (existing) and the fleet_deckies DB
table (added in 646aeec).  Comment makes the single-sink design
explicit so future maintainers don't add a parallel save_state /
upsert_fleet_decky call here.

No behavioral change — every fleet-creation path on every host (CLI
deploy, this unihost API path, and per-worker SWARM agent deploys)
already routes through the engine.deployer single sink.
2026-04-26 21:08:44 -04:00
646aeeca40 feat(deployer): mirror fleet deploy/teardown into fleet_deckies table
CLI deploy now writes both surfaces: decnet-state.json (existing,
canonical for offline / no-API hosts) and the new fleet_deckies DB
table (visible to orchestrator, web dashboard, REST API).

Best-effort: a DB outage logs a warning and returns. The JSON file
remains the source of truth for `decnet status`, `decnet teardown`,
sniffer, and collector — operators on a CLI-only host keep working.

_run_async helper bridges sync deploy() into the async repository.
Always uses a fresh thread because the API handler at
web.router.fleet.api_deploy_deckies invokes deploy() from inside a
FastAPI event loop, which would otherwise break asyncio.run.

Verified end-to-end against MySQL: deploy mirror inserts rows, union
view (list_running_deckies) returns them with source="fleet",
teardown mirror removes them. Works from both sync (CLI) and async
(API handler) call sites.
2026-04-26 21:05:50 -04:00
095500ae9a feat(db): FleetDecky table mirrors decnet-state.json into the DB
Adds a fleet_deckies table so DB-only consumers (orchestrator, web
dashboard, REST API) can see unihost / MACVLAN / IPVLAN deckies
without reading the JSON state file. Mirrors DeckyShard field-for-field.

Composite PK (host_uuid, name) future-proofs for a mothership that
runs both a local fleet and acts as a swarm master. host_uuid defaults
to the "local" sentinel — no FK to swarm_hosts because the local
mothership isn't enrolled as a worker.

Repo additions: upsert_fleet_decky, delete_fleet_decky,
list_fleet_deckies, list_running_fleet_deckies,
update_fleet_decky_state, plus list_running_deckies which unions
topology + fleet + shard sources for the orchestrator.

Smoke-tested round-trip against MySQL: upsert, list_running, union
view (source="fleet"), delete.
2026-04-26 21:00:01 -04:00
c595d039bd feat(sniffer): ISN sequence classifier (reuses seq_class helper)
Mirrors the IP-ID classifier for TCP ISN values: per-source-IP rolling
deque (maxlen=8) populated from each inbound SYN's tcp.seq, classified
on every emission. A 'random' verdict is the modern norm; 'incremental',
'zero', or 'constant' indicates legacy stacks or hand-rolled raw-socket
tooling — a strong fingerprint signal.

Active prober now also captures server_isn (single sample, not classified
in-flight; downstream consumers correlating multi-probe results can apply
seq_class.classify_sequence themselves).

Profiler rollup carries the latest non-'unknown' label into
attacker.tcp_fingerprint. Dedup key already covers isn_class from
the previous commit, so transitions emit cleanly.

UI surfaces ISN class as a colour-coded tag with a ⚠ glyph for
non-random verdicts, since they're the genuinely interesting case.
2026-04-26 20:30:24 -04:00
0e40cc8ae1 feat(sniffer): IP-ID sequence classifier (random/incremental/zero/constant)
Adds a per-source-IP rolling sample buffer (deque, maxlen=8) for IP-ID
values seen on attacker SYNs and a stdlib-only classifier in
decnet/sniffer/seq_class.py. Each new SYN appends ip.id and re-classifies
the buffer; the result is logged on tcp_syn_fingerprint events alongside
sample count.

The dedup key now folds in ipid_class so a transition from 'unknown' to
a definitive verdict emits exactly one fresh event instead of being
suppressed by the old (os|options) key. Profiler rollup carries the
latest non-'unknown' label into attacker.tcp_fingerprint.

UI surfaces it as a colour-coded tag in the TCP STACK panel: random
neutral, incremental amber, zero/constant green (the strong signal).
2026-04-26 20:28:32 -04:00
b0b08754d0 feat(fingerprint): ToS/DSCP/ECN extraction in active + passive TCP fingerprint
Active prober now reads ip.tos from the SYN-ACK and emits tos/dscp/ecn
alongside the existing TTL/window/options fields. dscp is folded into the
fingerprint hash so different DSCP markings produce distinct signatures.

Passive sniffer logs the same three fields on tcp_syn_fingerprint events;
profiler rollup carries them into the attacker tcp_fingerprint snapshot;
AttackerDetail's TCP STACK panel now surfaces DSCP and ECN cells.
2026-04-26 20:25:37 -04:00
3de19eb102 feat(orchestrator): periodic prune of orchestrator_events
Every 100 ticks, trim per-dst_decky_uuid history down to 10000 rows
(oldest first). Keeps the events table bounded on long-running fleets
without paying the cost on every write.
2026-04-26 19:58:43 -04:00
5b5ff54fa2 feat(web): orchestrator events read API + SSE stream
GET /api/v1/orchestrator/events — paginated list with optional
kind=traffic|file filter. GET /api/v1/orchestrator/events/stream —
SSE: snapshot on connect, live forward of orchestrator.> bus events
mapped to 'traffic' / 'file' SSE event names.

Repo gains list_orchestrator_events(limit, offset, kind?, since_ts?),
count_orchestrator_events(kind?), and prune_orchestrator_events
(per_dst_cap=10000) for periodic worker-side trimming.
2026-04-26 19:58:12 -04:00
900c0c3ef5 refactor(bus): rename ORCHESTRATOR_ACTIVITY → ORCHESTRATOR_TRAFFIC
Aligns the bus token with the DB column value; OrchestratorEvent.kind
is 'traffic'/'file' but the topic was 'activity'/'file'. The asymmetry
made consumer code (UI filter, SSE event names) need a translation
layer. No external subscribers existed yet.
2026-04-26 19:53:40 -04:00
4c37ece39e feat(orchestrator): MVP synthetic life-injection worker (SSH only)
Adds a new decnet orchestrate worker whose job is to keep the honeypot
ecosystem from looking suspiciously static — a frozen LAN with no
inter-host traffic and no filesystem aging is its own honeypot tell.

MVP scope:
- New OrchestratorEvent table + repo methods (purpose-built sibling
  to Log so synthetic events stay separable from attacker-driven ones).
- New orchestrator.{activity,file}.<decky_id> bus topics +
  system.orchestrator.health heartbeat.
- SSH-only driver. Traffic action runs python3 inside src container
  to TCP-connect dst:22 and read the SSH banner — real on-the-wire
  SSH-protocol traffic without shipping creds. File action drops or
  refreshes a small file via docker exec on the destination.
- Random scheduler (50/50 traffic/file when >=2 SSH-capable deckies
  are running). Diurnal shaping, role-aware pairing, and session-aware
  backoff are explicit non-goals for MVP.
- CLI registration, systemd unit (SupplementaryGroups=docker),
  worker-registry entry so the dashboard shows orchestrator health.
- 11 tests: scheduler policy, driver argv shape + injection-safety,
  end-to-end one-tick integration with FakeBus + SQLite.
2026-04-26 19:43:20 -04:00
d531cea536 feat(web): read-only campaigns API + SSE + frontend
API: /api/v1/campaigns (paginated list), /api/v1/campaigns/{uuid}
(soft-merge chain follow), /api/v1/campaigns/{uuid}/identities
(member identities), and /api/v1/campaigns/events (SSE under
campaign.> + JWT-via-?token=, snapshot-on-connect). Mirror of the
identity router; same auth, same shape, same OpenAPI tags pattern.

Frontend: CampaignDetail.tsx page (same visual vocabulary as
IdentityDetail), useCampaignStream hook (mirror of
useIdentityStream), /campaigns/:id route, IdentityDetail's
CAMPAIGN badge becomes clickable and navigates to the campaign.
useIdentityStream now listens for identity.campaign.assigned so
the badge appears live without a manual refresh.
2026-04-26 09:20:17 -04:00
75af00c9c8 test(clustering): full-bound passes through production campaign clusterer
Runs the chained identity + campaign clustering pipeline against all
seven fixtures via from_synthetic / from_synthetic_identity adapters
and ratchets every YAML floor to 1.0 — the production clusterer
(and the reference clusterers used in the per-fixture tests) all
score perfectly across ARI / homogeneity / completeness /
singleton_recall on each fixture.

Three substrate fixes surfaced by the ratchet:

- Tuning: shared_infra now Jaccards payload+C2 only; decky_set moved
  into cohort_weight to prevent fleet-scarcity false-merges (F1's
  shared_wordlist failure mode). Tier weight raised to 1.0 so
  shared payload+C2 alone crosses threshold (F5's intended pass).
- Adapter: from_synthetic_identity now reads SyntheticSession
  started_at + duration_s for session_windows and per-decky
  timestamps (the production-row adapter still uses start_ts/end_ts
  when available).
- Fixture data: paused_campaign.yaml's JA3 collided exactly with
  vpn_hopping.yaml's (same TLS extension list). The collision
  fused two unrelated campaigns under the chained identity layer
  in the noise_floor composite. Made paused's JA3 distinct.

Also wires Campaign / CampaignsResponse into models/__init__.py's
__all__ that was missed in the schema commit.
2026-04-26 09:13:59 -04:00
6936a1426c feat(clustering): campaign-clusterer worker + bus topics + CLI
The campaign clusterer worker mirrors the identity-side worker shell
(bus connect, heartbeat, control listener, slow-tick fallback) but
wakes on identity.> instead of attacker.> — campaign-level work is
gated on identity-layer changes, not raw observations.

The connected-components implementation reads identities via
list_identities_for_clustering, projects them with from_identity_row,
runs union-find over combined_campaign_weight, writes campaigns rows,
sets attacker_identities.campaign_id, and runs the same revocable-
merge pass as the identity layer (a merged-out campaign whose
identities no longer co-cluster with the winner gets revoked).

Bus: adds campaign.> family (formed / identity.assigned / merged /
unmerged) plus the cross-family identity.campaign.assigned so
existing identity-stream subscribers see the badge update without
having to subscribe to campaign.>. Wiki Service-Bus.md updated in
wiki-checkout in the same wave per the project's bus-signals
discipline.

CLI: decnet campaign-clusterer registered as master-only via
MASTER_ONLY_COMMANDS; --poll-interval / --daemon mirror the identity
clusterer command surface.
2026-04-26 09:04:00 -04:00
0946bab424 feat(clustering): campaign-level similarity primitives
The signal taxonomy for the campaign clusterer (next commit). Mirror
of the identity-layer module but with edge families that don't
translate 1:1: phase-handoff (load-bearing for F5 multi_operator —
the signal the identity-side fingerprint-disagreement veto deliberately
isn't), shared-infra (vetoed at identity level, primary positive
signal here), temporal-overlap (pairwise-relative — F7 invariance
preserved), cohort (weak supporting weight only).

Tier weights tuned so phase-handoff alone crosses threshold (F5),
shared-infra + temporal-overlap together cross (canonical co-op
pattern), and shared-infra + cohort together do NOT (F1
shared_wordlist's failure mode). The F7 time-shift invariant is
explicitly tested on every time-bearing edge and on the combined
weight.
2026-04-26 08:57:46 -04:00
0a1cf65ddb feat(db): Campaign SQLModel + repo write/read methods
Adds the campaigns table and the BaseRepository / SQLModelRepository
methods that the campaign-clusterer worker (next commit) needs to
populate it. Mirrors the AttackerIdentity layer: schema_version from
day one for federation gossip, soft-merge via merged_into_uuid with a
chain-walking get_campaign_by_uuid, list_campaigns excluding merged-
out rows while list_all_campaigns returns the unfiltered set for the
revoke pass. attacker_identities.campaign_id gets a real FK now that
the target table exists.
2026-04-26 08:54:28 -04:00
97aa57faed feat(api): SSE stream for identity events at /api/v1/identities/events
Mirrors GET /api/v1/topologies/{id}/events: subscribes to identity.>
on the bus for the duration of the request and forwards each event as
a named SSE frame (formed / observation.linked / merged / unmerged).

The endpoint is broadly scoped (every identity event, not per-uuid)
because both AttackerDetail and IdentityDetail need the same
firehose: AttackerDetail watches for an identity.formed that finally
binds its identity_id; IdentityDetail watches for
observation.linked / merged / unmerged against its current row. A
per-uuid filter would force the client to know its identity before
subscribing, which it doesn't always.

JWT via ?token= (EventSource can't set headers), require_stream_viewer
gate, sse_connection_slot per-user cap, snapshot-on-connect with
the first 50 identities so the client buffer renders without a
separate REST call.

Bus-disabled / unreachable path keeps the connection alive on
keepalives so the client doesn't reconnect-storm; it can re-poll
the REST API on its own timer.
2026-04-26 08:36:17 -04:00
e364ef8859 feat(clustering): revocable merges (merge + unmerge)
Reworks the clusterer's tick to handle multi-identity components and
re-evaluate prior merges. Two passes per tick:

Pass 1 — per-component reconciliation:
  * Fresh component → mint identity (commit 4 path).
  * Single-identity component → link unassigned observations.
  * Multi-identity component → soft-merge: pick the smallest-uuid
    winner deterministically, set merged_into_uuid on each loser,
    link unassigned observations to the winner. Observations stay
    FK'd to their original identity row — the merge is a soft
    pointer, not a re-point. Audit trail preserved; cached
    subscribers resolve through the chain.

Pass 2 — revocable-merge undo:
  * For each merged-out identity, check whether its observations
    still cluster with its winner's. If not, the merge is
    contradicted by new evidence — clear merged_into_uuid and emit
    identities_unmerged. The resurrected identity keeps its original
    uuid, so subscribers that cached it during the merged interval
    re-attach without a new lookup.

A pre-built merge-chain dict feeds Pass 1 so the effective-identity
lookup is O(1) per observation. The chain has a hop cap (paranoia
against accidental cycles in the underlying state).

Repo additions on BaseRepository + SQLModelRepository:
  * list_all_identities() — includes merged-out rows.
  * update_identity_merged_into(uuid, winner_or_None) — single
    setter for both merge and unmerge.
DummyRepo coverage stub updated.

Tests:
  * Two distinct identities bridged by a new observation merge with
    the smaller uuid as winner.
  * A pre-seeded soft-merge whose underlying observations diverge
    gets revoked; resurrected uuid emerges with merged_into_uuid
    cleared.
  * Tick is idempotent under no state changes.
2026-04-26 08:33:32 -04:00
ed323581fe feat(clustering): fingerprint-disagreement veto for fixture 5
Two operators cooperating on one campaign can share C2 endpoints +
stage-1 payloads while running distinct tooling — fixture 5
(multi_operator) is the canonical demonstration. The identity
clusterer must NOT fuse them: shared infra is a campaign-level
signal, not an identity-level one. The campaign clusterer (downstream
work) handles that grouping over identities.

Mechanism: when two observations have non-null fingerprints AND the
fingerprints fully disagree, the high-weight tier drops the payload
and C2 contributions to zero. JA3 / HASSH agreement still returns
1.0 directly — no veto applies when something agrees. Partial
agreement (one slot agrees, another disagrees) is treated as
agreement, since stable-tool partial overlap is more consistent
with one identity than two.

The veto only triggers when there is actual disagreement evidence —
two un-fingerprinted observations sharing a C2 still cluster, since
the absence of fingerprints is not the same as disagreement on them.

Fixture 5 production-clusterer assertion added at identity level:
ARI = 1.0, homogeneity = 1.0, exactly 2 predicted clusters from
2 truth identities. Phase-handoff edges (from the TODO) belong to
the downstream campaign clusterer, not this identity clusterer.
2026-04-26 08:24:22 -04:00
f7da33726c feat(clustering): combined edge weight + medium-tier wiring
The clusterer now drops a single high-tier function call in favor of
a tier-weighted sum. Tier multipliers (high=1.0, medium=0.6, low=0.2,
very_low=0.05) are tuned so the threshold (1.0) admits high-tier
agreement alone while leaving every weaker tier — and every
combination of weaker tiers — under threshold.

Per-tier discipline tested:
- high alone clusters
- medium alone does NOT cluster (supporting signal only)
- low alone does NOT cluster (fixture 1's failure mode)
- very-low alone does NOT cluster (fixture 2's failure mode)
- all three weak tiers stacked still don't reach threshold
- high + medium clusters (high already saturates)

The combination is forward-compatible: low + very-low contributions
are computed today but always project to 0.0 because the production
adapter doesn't populate credentials / ASN-edge inputs into the
fixture path yet. Their contribution becomes load-bearing in commit 7
when the low-tier landing tightens the F1 / F2 bounds.

Fixture 4 (paused_campaign) ratchet added: high-tier signal carries
the multi-day-silence campaign into one identity. Time-agnostic
invariant — silence is irrelevant to the edge weight.
2026-04-26 08:22:10 -04:00
de2f4c3a62 feat(clustering): wire high-weight edges end-to-end
The connected-components clusterer now writes attacker_identities
rows + sets attackers.identity_id when high-weight signals (JA3 /
HASSH / payload-hash / C2-endpoint exact match) agree across
observations. Singletons stay un-fingerprinted and un-clustered.

Algorithm split:
- cluster_observations(observations) — pure union-find over the
  high-weight edge function. Same code path for fixture validation
  and production tick.
- from_attacker_row(row) — production-row adapter; recovers JA3 +
  HASSH from Attacker.fingerprints JSON. Payload + C2 join from
  logs in later commits; the function shape doesn't change.

Repo additions on BaseRepository + SQLModelRepository:
- list_attackers_for_clustering(limit=None)
- create_attacker_identity(row)
- set_attacker_identity_id(attacker_uuid, identity_uuid)
DummyRepo coverage stub updated.

v1 behavior is conservative: only assigns identities to observations
whose identity_id is currently NULL. Multi-identity components are
skipped this pass — merge / re-assign lands in commit 10 with
revocable merges.

Fixture bounds tightened against the production clusterer:
- lone_wolf (F3) — singletons stay singletons
- shared_wordlist (F1) — credential-only overlap doesn't cluster
  (high-weight tier doesn't include credentials)
- vpn_hopping (F2, identity-level) — 5 rotated IPs with stable JA3
  + HASSH fold into one identity, ARI = 1.0, completeness = 1.0
2026-04-26 08:19:56 -04:00
a9775c4000 feat(clustering): similarity-graph primitives
Adds the four weight-tier edge functions as pure, time-agnostic
scoring primitives over an Observation projection. Each returns a
score in [0, 1]; the connected-components impl will combine + threshold
in subsequent commits.

Tier semantics (from IDENTITY_RESOLUTION.md):
- high   — JA3/HASSH/payload-hash/C2-endpoint exact match
- medium — phase-bucketed command-sequence Jaccard
- low    — credential-attempt-set Jaccard (defeated alone by F1)
- very low — ASN equality (defeated alone by F2)

Time-agnostic invariant is a static test: Observation has no time
fields, so no edge function can silently start using them. Fixture 7
forbids recency-decay clustering on multi-month APT campaigns.

A from_synthetic() adapter projects SyntheticAttacker corpora into
Observation; the production-row adapter lands when the clusterer
starts reading the attackers table.
2026-04-26 08:13:29 -04:00
fb522af107 feat(bus): reserve identity.unmerged topic
Revocable merges (a contradiction-driven undo of identity.merged) ship
in the clusterer work; this reserves the topic up-front so identity.>
subscribers receive it day one without a re-subscribe.

The clusterer worker's ClusterResult fan-out now publishes on
identity.unmerged when populated. The skeleton clusterer never
populates it; the revocable-merge commit will.

Wiki update lives in wiki-checkout/Service-Bus.md (separate repo).
2026-04-26 08:10:56 -04:00
e545f7d8d3 feat(clustering): identity clusterer worker skeleton
Adds the decnet clusterer master-only command + provider-subpackage
shape (base.py + factory.py + impl/connected_components.py) so
subsequent commits can land similarity-graph features without
churning callers.

The skeleton ConnectedComponentsClusterer.tick is a no-op; the
worker shell is fully wired (bus consumer on attacker.observed +
attacker.scored, slow-tick fallback, health heartbeat, control
listener, ClusterResult fan-out to identity.formed/observation.linked
/merged). Subscribers on identity.> see no events from this clusterer
until edge functions land, but the lifecycle is in place.
2026-04-26 08:09:11 -04:00
4f1077be72 feat(bus): identity.* topic family (formed / observation.linked / merged)
Fourth of the five-step identity-resolution substrate. Constants and
builder ship now; no publishers exist yet — they land with the
clusterer worker. Subscribers (webhook worker, dashboard SSE relay)
can register against identity.> from day one.

* decnet/bus/topics.py — IDENTITY root + IDENTITY_FORMED /
  IDENTITY_OBSERVATION_LINKED / IDENTITY_MERGED leaves; identity()
  builder mirroring the attacker() / system() helpers. Module
  docstring topic-tree updated.
* tests/bus/test_topics.py — assert builder produces the expected
  three topic strings + rejects empty event_type.

Wiki Service-Bus.md and a new Identity-Resolution.md page land in the
companion wiki-checkout commit.
2026-04-26 07:15:44 -04:00
dc3d08dd41 feat(web): read-only /api/v1/identities/* endpoints + repo methods
Second of the five-step identity-resolution substrate. Ships the API
surface against the empty AttackerIdentity table from commit 1 — every
endpoint returns empty/404 cleanly until the clusterer populates rows.

Routes (auth-gated, viewer role):
* GET /api/v1/identities — paginated list, excludes merged-out rows
* GET /api/v1/identities/{uuid} — detail; transparently follows
  merged_into_uuid to surface the canonical winner
* GET /api/v1/identities/{uuid}/observations — Attacker rows FK'd
  to the (resolved) identity uuid

Repository (BaseRepository abstract + SQLModelRepository concrete):
* get_identity_by_uuid (with merge-chain following, hop-bounded)
* list_identities / count_identities (excluding merged-out)
* list_observations_for_identity / count_observations_for_identity

Tests: 12 new (empty-table behavior, seeded data, merge-chain
resolution, repo-level smoke against real SQLite). Also fixes the
pre-existing test_base_repo_coverage failure (DEBT-041 added abstract
methods without updating the DummyRepo stub) — included here because
this PR adds 5 more abstract methods, fixing it as a bonus.

474 db/web/profiler/correlation tests green.
2026-04-26 07:08:55 -04:00
84c1ca9c9b feat(identity): AttackerIdentity table + nullable attackers.identity_id FK
Schema-only commit, first of the five-step substrate for identity
resolution. The clusterer that populates identities lands later; this
ships the table empty and the FK uniformly NULL on existing rows.

* decnet/web/db/models/attackers.py — new AttackerIdentity SQLModel
  (uuid PK, schema_version, fingerprint summary lists, kd_digraph_simhash,
  merged_into_uuid self-FK, all clusterer-populated fields nullable).
  Attacker grows a nullable indexed identity_id FK + docstring marking
  it as the per-IP observation row.
* decnet/web/db/models/__init__.py — re-exports AttackerIdentity.
* tests/db/test_identity_schema.py — 9 schema invariants: table exists,
  identity_id nullable + indexed, FK targets attacker_identities.uuid,
  schema_version defaults to 1, attacker rows inserted with NULL
  identity_id, FK constraint blocks orphans.

463 unrelated db/web/profiler/correlation tests still green. See
development/IDENTITY_RESOLUTION.md for the full design.
2026-04-26 07:00:24 -04:00