DECNET

Author	SHA1	Message	Date
anti	6376523923	feat(canary): mysql_dump generator with phone-home replica payload Mirrors the Canarytokens.org trick: a base64-wrapped CHANGE REPLICATION SOURCE TO + START REPLICA block in the dump trailer. Importing the file into MySQL resolves <slug>.<dns_zone> (DNS trip) and opens a 3306 replica handshake whose SOURCE_USER smuggles @@hostname and @@lc_time_names of the victim DB. DNS lookup alone is sufficient for detection via the existing canary dns_server; capturing the smuggled metadata via a 3306 handshake responder is a follow-up.	2026-04-27 13:52:55 -04:00
anti	5ac8e0f91a	feat(canary): honeydoc_docx + honeydoc_pdf generators honeydoc previously emitted HTML only — operators picking 'Document' out of the dropdown got a .html file dropped at /Documents/ quarterly_report.docx, which any attacker would clock the moment they ran 'file' on it. Two new generators that emit the real artifact format: - honeydoc_docx: stdlib zipfile only. Builds a minimal but valid Office Open XML zip with the same Q3 review body as the HTML flavor and an external-image relationship pointing at the callback URL — same trick the operator-upload DOCX instrumenter uses, fetched on document open by Word and LibreOffice. Reuses _drawing() and _next_rid() from instrumenters/docx.py to keep the body/relationships shape identical between synthesised and instrumented files. - honeydoc_pdf: pikepdf-backed. One-page PDF in the 14 base fonts (Helvetica, no font embedding), realistic body, /OpenAction /URI on the catalog so most viewers fire the callback on document open. Falls back to a clear error if pikepdf is missing so the operator can switch to honeydoc / honeydoc_docx. Default placement paths now reflect each generator's true extension (.html / .docx / .pdf) so the UI suggests something sensible. Both generators surfaced in the New Token modal's generator dropdown.	2026-04-27 13:44:20 -04:00
anti	c17b9e01c8	fix(canary): stream base64 payload via stdin to avoid ARG_MAX Real-world plant() crashed with OSError [Errno 7] Argument list too long when an artifact (honeydoc HTML / DOCX / PDF) base64-encoded into the sh -c script body exceeded the kernel's argv limit (typically 128KB-2MB depending on the host). Fix: keep the script trivial ('mkdir -p ... && base64 -d > path && ...') and stream the encoded bytes through 'docker exec -i ... sh -c' stdin instead. _run() grew an optional stdin_bytes parameter that's piped into proc.communicate(input=...). The stdin path covers arbitrarily large artifacts. Tests updated: - test_plant_argv_and_base64_round_trip now asserts the docker -i flag is present and the base64 payload reaches stdin (and notably is NOT in the script body). - _FakeProc.communicate accepts input=None across the board so the patched fast path no longer trips on the new kwarg.	2026-04-27 13:37:19 -04:00
anti	53d08e01e5	feat(systemd): decnet-canary.service unit + tests Worker unit mirrors decnet-webhook.service shape: simple type, runs as the decnet user/group, append-style log file, full security hardening (NoNewPrivileges/ProtectSystem/ProtectHome/PrivateTmp/ LockPersonality + the rest). Added /var/lib/decnet to ReadWritePaths because the API process persists operator-uploaded canary blobs there. CAP_NET_BIND_SERVICE granted (ambient + bounded) so an operator who overrides DECNET_CANARY_DNS_PORT to 53 or HTTP_PORT to 80/443 in .env.local doesn't need to fight systemd. The defaults stay unprivileged (5353 / 8088). Added decnet-canary.service to decnet.target so 'systemctl start decnet.target' brings it up alongside the rest of the workers. decnet init auto-discovers deploy/decnet-*.service.j2 files (per decnet/cli/init.py:_install_units) so no further wiring needed — running 'decnet init' on a fresh host installs the new unit. Static tests confirm the unit references decnet canary, depends on the bus, carries the standard security directives, and is listed in the master target.	2026-04-27 13:20:47 -04:00
anti	34c85346a6	feat(deploy): seed canary baseline at deploy time + tests Hooks decnet.canary.planter.seed_baseline into the deploy() flow's fleet-mirror step. After upserting a FleetDecky as 'running' we seed the configured baseline canary set on the freshly-deployed decky. Persona detection: read d.nmap_os (Windows -> windows path-mapping, otherwise linux). Failures are logged and surface as state=failed rows in the UI; the deploy itself MUST NOT abort (resilience principle in CLAUDE.md). Tests confirm: - seed_baseline produces one row per configured generator per decky; - the deployer source wires seed_baseline inside a try/except so a failure can't abort the deploy.	2026-04-27 13:19:08 -04:00
anti	6c4ea706f8	feat(api): canary token CRUD router (/api/v1/canary) + tests Two sub-routers under /api/v1/canary: blobs (operator-uploaded artifacts, deduped by sha256): - POST /blobs (multipart upload; admin) - GET /blobs (list with token_count; admin) - DELETE /blobs/{uuid} (refcount-aware; 409 when referenced; admin) tokens (per-decky planted artifacts): - POST /tokens (generate or instrument + plant; admin) - GET /tokens?decky_name=&kind=&state= (filter; viewer) - GET /tokens/{uuid} (detail; viewer) - GET /tokens/{uuid}/preview (instrumented bytes; admin) - GET /tokens/{uuid}/triggers (paged callback log; viewer) - DELETE /tokens/{uuid} (revoke + bus event; admin) XOR validation: exactly one of blob_uuid / generator must be set. Path validation rejects relative/NUL/newlines/.. segments. Every body-bearing route documents 400 plus 401/403/404 as applicable. Stdlib MIME sniffer (no python-magic dep) covers PNG/JPEG/GIF/PDF/ HTML/XML/DOCX/XLSX/JSON/YAML/TOML/text/plain; everything else falls through to passthrough. Tests run end-to-end through the live FastAPI app (planter docker exec is patched); 17 cases covering dedup, refcount, lifecycle, XOR validation, path validation, and 404 paths.	2026-04-27 13:18:00 -04:00
anti	f9513bb7dd	feat(cli): register decnet canary subcommand + tests decnet canary launches the HTTP + DNS callback receiver via decnet.canary.worker.run. Mirrors the shape of decnet webhook (typer command with --daemon flag, asyncio.run in the foreground). Deliberately NOT added to MASTER_ONLY_COMMANDS — every host that hosts deckies runs its own canary worker, and the bus events stay local to that host (per-host webhook fanout handles SIEM egress).	2026-04-27 13:13:23 -04:00
anti	fae3e0caa3	feat(canary): worker (HTTP + stdlib DNS callback receivers) + tests decnet canary worker hosts both callback surfaces in one process: - HTTP: a tiny FastAPI app on its own port (default 8088). The only meaningful route is GET /c/{slug} which looks up the slug, persists a CanaryTrigger, publishes canary.<id>.triggered, and returns a 1x1 transparent GIF. Unknown slugs return the same response (stealth); no decnet strings leak in headers/banners; docs/openapi/redoc are disabled. X-Forwarded-For is honored. - DNS: an authoritative UDP server for *.<canary_zone> using asyncio.DatagramProtocol with stdlib-only DNS wire-format parsing (no dnslib dep). Same lookup -> persist -> publish flow, plus a sinkhole A record (192.0.2.1) so the attacker's resolver doesn't loop on NXDOMAIN. Single-label slugs only; multi-label probes return NXDOMAIN. Pointer loops in malformed queries are caught (10-hop cap) so an adversarial packet can't wedge the parser. Tests cover both surfaces without privileged sockets: - HTTP via Starlette TestClient: known/unknown slug, headers, XFF, stealth-string assertions. - DNS via direct DatagramProtocol drive: known slug -> ANSWER, unknown -> NXDOMAIN, pointer-loop -> ValueError, malformed packet -> silent drop.	2026-04-27 13:12:05 -04:00
anti	8fb9bc5545	feat(canary): planter (docker exec injector) + tests Plant / revoke / seed_baseline using the same docker-exec-with-sh-c pattern proven by decnet/orchestrator/drivers/ssh.py:_run_file. Each plant call composes a single sh script: mkdir -p <dirname> && printf %s <base64> \| base64 -d > <path> && chmod <mode> <path> && touch -d @<mtime> <path> Base64-on-the-host / decode-in-the-container keeps binary artifacts (DOCX/PDF/PNG) safe across the argv boundary; the placement_path, mode, and mtime are shlex-quoted. State transitions hit the repo: planted -> failed on docker error with stderr captured into last_error. Bus events fire on success (canary.<id>.placed) and on revoke (canary.<id>.revoked) — wrapped in try/except so a downed bus never blocks a placement. seed_baseline(decky_name, repo) is the deploy-hook entry point — reads DECNET_CANARY_BASELINE (default git_config,env_file,honeydoc, aws_creds), persists one row per generator, plants each. Failed placements are logged but do NOT abort; the deployer hook treats the return list as informational.	2026-04-27 13:08:18 -04:00
anti	19ceff4417	feat(canary): operator-upload instrumenters + tests Seven instrumenters that mutate operator-supplied artifacts to embed the callback URL: - passthrough — bytes unchanged; only DNS-callback tokens trip detection, with the slug embedded in the placement path - plain — substitutes {{CANARY_URL}}/{{CANARY_HOST}} placeholders; falls back to appending a comment line whose prefix adapts to the apparent file syntax (#, //, ;) - html — injects a 1x1 tracking pixel before </body>, appends if the close tag is missing - docx — direct zipfile manipulation (no python-docx dep): inserts an external-image Relationship into word/_rels/document.xml.rels and a matching <w:drawing> element before </w:body> - xlsx — sibling of docx; injects an external-image relationship into xl/_rels/workbook.xml.rels (orphan rels are still fetched on open by most viewers) - pdf — uses pikepdf to install /OpenAction /URI on the catalog; rejects with a clear message when pikepdf isn't installed - image — uses Pillow to embed slug + URL in PNG tEXt / JPEG comment; rejects with a clear message when Pillow isn't installed DOCX and XLSX share the rId allocator + relationship injector via the docx module; both work on stdlib zipfile only. Tests synthesise minimal real DOCX/XLSX fixtures inline, round-trip each instrumenter, and assert the callback URL ends up in the mutated bytes while the file still parses.	2026-04-27 13:03:42 -04:00
anti	c7658ea65e	feat(canary): synthesised-artifact generators + tests Five built-in generators that produce deterministic fake artifacts keyed by the token slug: - aws_creds — passive [default]/[prod] credentials block, no callback wiring (AWS-key tokens require an external trap, which is post-v1) - git_config — .git/config with origin url = http_base/c/<slug>/repo.git - env_file — .env with API_BASE_URL + WEBHOOK_NOTIFY_URL embedding the callback URL plus inert realism filler - ssh_key — PEM-shaped fake private key whose host comment carries <slug>.<dns_zone> when DNS is deployed, else the http_base host - honeydoc — minimal HTML report with a 1x1 tracking-pixel <img> whose src is the callback URL; fallback for the deploy-time baseline before the operator uploads a real DOCX/PDF Tests assert byte-stability (same ctx -> same bytes), slug presence in the embedded fields, that aws_creds is intentionally URL-free, and that every artifact carries operator-facing notes for the preview endpoint.	2026-04-27 12:59:19 -04:00
anti	8f19adecfe	feat(canary): package scaffolding (base/factory/paths/storage) + tests Mirrors the decnet.intel layout (base + factory + lazy concrete imports). Defines: - CanaryArtifact / CanaryContext dataclasses + the generator and instrumenter ABCs they share - factory dispatch for generators (git_config/env_file/ssh_key/ aws_creds/honeydoc) and instrumenters (docx/xlsx/pdf/html/image/ plain/passthrough), plus pick_instrumenter_for_mime() for MIME-driven dispatch on operator uploads - persona-aware default placement paths (Linux vs. Windows-shaped) and absolute-path validation that the API will use to validate operator-supplied placement_path values - on-disk blob store: sha256-keyed two-level fan-out, idempotent writes, refcount-aware unlink (the DB row is the source of truth) Also covers prior commits' tests (bus topics, models, repo CRUD) under tests/canary/. 79 tests, all pass.	2026-04-27 12:56:01 -04:00
anti	f046634d6e	feat(web): Persona Generation page under AUTOMATION New dashboard surface for editing the global emailgen persona pool — the JSON file fleet (MACVLAN/IPVLAN) and SWARM-shard mail deckies pull from. MazeNET topology personas are out of scope here; they're configured per-topology in the topology editor. Backend: * GET/PUT /api/v1/emailgen/personas — admin-write, viewer-read. PUT validates with the same Pydantic schema the worker uses (parse_personas), drops invalid entries with a warning, returns 400 only when the entire payload fails. Path is operator-discoverable on every response so a CLI-driven backup workflow stays visible. Frontend: * PersonaGeneration.tsx + .css — table + add/edit modal with the full EmailPersona schema (name, email, role, tone, mannerisms list, language, signature, active hours, reply latency, uses_llms_heavily). Local edits are batched; explicit "SAVE CHANGES" writes back, with a dirty-indicator pill and a "DISCARD" reset. Email uniqueness is enforced client-side so the scheduler never picks the same persona as both sender + recipient. * Sidebar AUTOMATION group gains a "Persona Generation" entry next to Orchestrator; route registered at /persona-generation. The worker reads the same on-disk file the API writes — see decnet.orchestrator.emailgen.global_pool. The API resets the in-process cache on every read/write so the worker picks up dashboard edits within its next tick rather than waiting on mtime.	2026-04-27 09:55:42 -04:00
anti	818aebadfc	feat(web): emailgen events in Orchestrator page The SSE pipe at /orchestrator/events/stream was already streaming 'orchestrator.email.{decky_uuid}' events (the subscription is for the 'orchestrator.>' wildcard), but the consumer side dropped them on the floor. Three fixes to close the loop: * useOrchestratorStream.ts now registers an 'email' SSE listener — the EventSource silently ignores frames whose event name has no listener, so missing this entry meant every email frame was dropped before reaching the page's onEvent handler. * /api/v1/orchestrator/events accepts kind=email and dispatches to list_orchestrator_emails, adapting rows to the existing wire shape: subject -> action, sender_email -> src_decky_uuid, recipient_email -> dst_decky_uuid, plus email-specific extras (thread_id, language, mail_decky_uuid, message_id, in_reply_to) ride along as top-level keys. * Orchestrator.tsx gains an 'email' tab in the kind filter and a branch in the row renderer / inspector that: - shows full sender / recipient (no UUID truncation), - chips the language code next to the subject, - relabels ACTION as SUBJECT in the inspector and surfaces thread / in-reply-to / mail-decky details. The 'all' tab continues to show traffic+file only (today's behavior); operators see emails by switching to the email tab. A union view at the API layer is the obvious follow-up but not necessary for now.	2026-04-26 22:56:48 -04:00
anti	f97ec4c2c1	feat(deploy): emailgen systemd unit + bring orchestrator + emailgen into decnet.target Plug emailgen into the systemd-supervised fleet: - New deploy/decnet-emailgen.service.j2 mirroring decnet-orchestrator's shape: simple service, restart-on-failure, docker supplementary group (driver shells `docker exec` to drop EMLs into the spool), the same hardening directives as the rest of the fleet. - decnet.target now Wants both decnet-emailgen.service and decnet-orchestrator.service. Orchestrator's absence from the target was a historical oversight — fixing it here while the file is open. `decnet init` already globs deploy/decnet-*.service.j2 so the new unit ships automatically; no init-side change needed. Emailgen-specific env knobs (DECNET_EMAILGEN_LLM, _MODEL, _PERSONAS, _TIMEOUT) are documented in the unit and operator-tunable via /opt/decnet/.env.local.	2026-04-26 22:49:16 -04:00
anti	73692b52f0	feat(emailgen): gate as master-only Two-layer gating per CLAUDE.md: - registration-time: emailgen added to MASTER_ONLY_GROUPS so agents don't see the sub-app in 'decnet --help' at all. - body-guard: _require_master_mode('emailgen ...') at the top of every sub-command body so a direct callable import (third-party tooling) still bails on agent hosts. Matches the convention used for 'swarm', 'topology', 'geoip'. SWARM agents push their generated mail through the master's emailgen worker (or none at all); cross-agent emailgen federation stays out of scope.	2026-04-26 22:45:59 -04:00
anti	6d520eaa6f	refactor(emailgen): pluggable LLM backend (base/factory/impl) Lift the Ollama subprocess shell-out out of EmailDriver and into a proper provider subpackage shape: decnet/orchestrator/emailgen/llm/ base.py — LLMBackend Protocol + LLMResult + LLMTimeout factory.py — get_llm() reads DECNET_EMAILGEN_LLM impl/ollama.py — current 'ollama run' subprocess path impl/fake.py — canned-output backend used by tests Driver now takes an LLMBackend on construction (or inherits the factory default). Tests inject FakeBackend instead of monkeypatching the subprocess layer, which is cleaner and ~10x faster. Swapping Ollama for the Anthropic API / vLLM / llama.cpp is now a third branch in factory.py; no driver rewrite needed. Mirrors the convention used by decnet.web.db.factory + decnet.bus.factory per the provider-subpackages-from-day-one rule in memory.	2026-04-26 22:43:36 -04:00
anti	4badc75fb2	feat(emailgen): global persona pool + Date-stamped EML mtimes Two changes that unwind earlier MazeNET-only assumptions and fix a realism tell: 1. Persona resolution is now per-decky-source, not topology-only. The scheduler walks the union view (list_running_deckies, including fleet MACVLAN/IPVLAN + SWARM shards) and picks the right persona list for each source: * topology decky -> Topology.email_personas (per-topology richness preserved) * fleet / shard -> a single host-wide pool loaded from disk (DECNET_EMAILGEN_PERSONAS, /etc/decnet/email_personas.json, or ~/.decnet/email_personas.json) Operators install the global pool via 'decnet emailgen import-personas <file>' which validates with the same Pydantic schema the worker uses. 2. The driver now runs 'touch -d <Date>' inside the docker exec right after the EML write so file mtime matches the email's RFC 2822 Date: header. Without this an attacker 'ls -lt'ing the spool sees every email clustered inside the worker's tick window — the cluster itself was a stylometric tell. CLI now exposes 'decnet emailgen' as a sub-app with 'run' (default, backwards-compatible with bare 'decnet emailgen') and 'import-personas'. list_running_deckies carries topology_id through so consumers can resolve the parent topology without a second round-trip.	2026-04-26 22:39:16 -04:00
anti	2979997442	feat(templates): IMAP/POP3 servers read EML spool from emailgen When IMAP_EMAIL_SEED / POP3_EMAIL_SEED points at a directory of .eml files (the orchestrator emailgen worker's drop path, /var/spool/decnet-emails/ by convention), the bait mailbox is replaced with those LLM-generated, persona-driven, threaded messages. Empty / missing dir keeps the hardcoded fallback so a fresh deployment is never silent. Cached with mtime invalidation + a short TTL so a hot mailbox doesn't pay the parse cost on every IMAP/POP3 command. Replaces the DEBT-026 stub on both templates that named the env var but never wired it through.	2026-04-26 22:21:01 -04:00
anti	3ee55ec341	feat(emailgen): Ollama-driven fake email worker for IMAP/POP3 deckies Second orchestrator worker (decnet emailgen) that drips persona-driven, threaded, multi-language fake emails into running mail deckies. Personas live on Topology.email_personas; topology-wide language_default falls through to any persona that doesn't pin its own. Em-dashes are suppressed at the prompt layer by default and only lifted for personas explicitly marked uses_llms_heavily — em-dashes are an LLM tell and a flat corpus of em-dashed mail is a giveaway. EML delivery writes into /var/spool/decnet-emails/<thread>/<msg>.eml on the mail decky via docker exec; wiring the IMAP/POP3 templates to read from that spool (replacing the hardcoded _BAIT_EMAILS) is the next step.	2026-04-26 22:16:19 -04:00
anti	430262e01a	feat(fleet): systemd unit + bus signal for fleet reconciler Two pieces, one PR because they share a deployment surface: 1. systemd. decnet-reconciler.service.j2 mirrors the orchestrator unit shape (docker group, hardened sandbox, append-logs). Read-only /var/lib/decnet so it can read decnet-state.json without write access. Auto-discovered by `decnet init` via the existing decnet-*.service.j2 glob — no init.py change needed. Added to decnet.target so `systemctl start decnet.target` brings it up alongside collector / sniffer / mutator / etc. Also added to the agent reaper script so self-destruct cleans it up on workers. 2. Bus signal. reconcile_once now publishes `decky.<host_uuid:name>.state` on every insert / delete / state-changed transition. Reuses the existing DECKY_STATE topic family (no bus/topics.py change → no wiki update needed per the bus-signals doc rule). Composite host_uuid:name segment keeps fleet rows distinguishable from MazeNET TopologyDecky rows whose ids are bare UUIDs. Quiet ticks publish nothing — convergence means silence. Bus is plumbed through the worker, defaults to None for unit-test callers. publish_safely keeps the source-of-truth contract: DB write is authoritative, the publish is best-effort notification. Captures previous_state into a local before update_fleet_decky_state runs — a fake repo that mutates rows in-place would otherwise see the post-update state and report previous == current. Real repos don't have this concern but the fix is cheap and makes the function less order-dependent.	2026-04-26 21:21:36 -04:00
anti	a8441481b5	fix(orchestrator): see fleet + shard deckies, not just topology rows Switches _one_tick from list_running_topology_deckies to list_running_deckies (the union view added in `095500a`). Resolves the permanent "no actionable deckies (running+ssh count=0)" log on hosts running only unihost MACVLAN / IPVLAN decoys — the orchestrator now sees fleet_deckies rows alongside MazeNET topology rows and SWARM DeckyShard rows. Also fixes the misleading log message: the old "running+ssh count=N" reported the pre-filter total (count of all running deckies, not the SSH-eligible subset that scheduler.pick actually evaluates). New line breaks down running, ssh_eligible, and per-source counts so debugging "why isn't it picking?" no longer requires reading scheduler internals. Regression test: orchestrator integration suite now seeds fleet_deckies rows (not just topology_deckies) and verifies a tick picks them and records an event with dst="local:fleet-*" — proving the original bug on the operator's mothership is fixed.	2026-04-26 21:16:22 -04:00
anti	f775223a83	feat(fleet): reconciler converges JSON ↔ DB ↔ docker Adds decnet.fleet.reconciler — a pure async function plus a long-lived worker — that periodically reconciles the three sources of truth on a DECNET host: 1. decnet-state.json (CLI-canonical fleet record) 2. fleet_deckies table (DB mirror, written by engine.deployer) 3. docker inspect (actual per-container runtime state) Drift handling: * JSON has X, DB doesn't → INSERT (deploy ran with DB offline) * DB has X (this host), JSON doesn't → DELETE (teardown ran with DB offline) * Both have X, docker disagrees → flip state to running/failed/degraded * Docker socket unreachable → leave existing state alone (don't torch every row to torn_down) Cross-host safety: deletions are scoped to host_uuid for the local host; a master that runs both a local fleet and swarm workers will never clobber a peer's slice. CLI: decnet reconcile --once # one-shot, prints counts decnet reconcile [--interval N] # long-lived worker, mirrors # orchestrator's lifecycle (control # listener + heartbeat + tick loop) Promotes decnet/fleet.py → decnet/fleet/ package so the reconciler can live alongside it without name collision (build_deckies_from_ini and all_service_names re-exported unchanged via __init__.py). 14 new tests cover state aggregation rules, all four drift directions, host_uuid scoping, docker-unreachable safety, and worker shutdown via the bus control event.	2026-04-26 21:14:48 -04:00
anti	646aeeca40	feat(deployer): mirror fleet deploy/teardown into fleet_deckies table CLI deploy now writes both surfaces: decnet-state.json (existing, canonical for offline / no-API hosts) and the new fleet_deckies DB table (visible to orchestrator, web dashboard, REST API). Best-effort: a DB outage logs a warning and returns. The JSON file remains the source of truth for `decnet status`, `decnet teardown`, sniffer, and collector — operators on a CLI-only host keep working. _run_async helper bridges sync deploy() into the async repository. Always uses a fresh thread because the API handler at web.router.fleet.api_deploy_deckies invokes deploy() from inside a FastAPI event loop, which would otherwise break asyncio.run. Verified end-to-end against MySQL: deploy mirror inserts rows, union view (list_running_deckies) returns them with source="fleet", teardown mirror removes them. Works from both sync (CLI) and async (API handler) call sites.	2026-04-26 21:05:50 -04:00
anti	c595d039bd	feat(sniffer): ISN sequence classifier (reuses seq_class helper) Mirrors the IP-ID classifier for TCP ISN values: per-source-IP rolling deque (maxlen=8) populated from each inbound SYN's tcp.seq, classified on every emission. A 'random' verdict is the modern norm; 'incremental', 'zero', or 'constant' indicates legacy stacks or hand-rolled raw-socket tooling — a strong fingerprint signal. Active prober now also captures server_isn (single sample, not classified in-flight; downstream consumers correlating multi-probe results can apply seq_class.classify_sequence themselves). Profiler rollup carries the latest non-'unknown' label into attacker.tcp_fingerprint. Dedup key already covers isn_class from the previous commit, so transitions emit cleanly. UI surfaces ISN class as a colour-coded tag with a ⚠ glyph for non-random verdicts, since they're the genuinely interesting case.	2026-04-26 20:30:24 -04:00
anti	0e40cc8ae1	feat(sniffer): IP-ID sequence classifier (random/incremental/zero/constant) Adds a per-source-IP rolling sample buffer (deque, maxlen=8) for IP-ID values seen on attacker SYNs and a stdlib-only classifier in decnet/sniffer/seq_class.py. Each new SYN appends ip.id and re-classifies the buffer; the result is logged on tcp_syn_fingerprint events alongside sample count. The dedup key now folds in ipid_class so a transition from 'unknown' to a definitive verdict emits exactly one fresh event instead of being suppressed by the old (os\|options) key. Profiler rollup carries the latest non-'unknown' label into attacker.tcp_fingerprint. UI surfaces it as a colour-coded tag in the TCP STACK panel: random neutral, incremental amber, zero/constant green (the strong signal).	2026-04-26 20:28:32 -04:00
anti	b0b08754d0	feat(fingerprint): ToS/DSCP/ECN extraction in active + passive TCP fingerprint Active prober now reads ip.tos from the SYN-ACK and emits tos/dscp/ecn alongside the existing TTL/window/options fields. dscp is folded into the fingerprint hash so different DSCP markings produce distinct signatures. Passive sniffer logs the same three fields on tcp_syn_fingerprint events; profiler rollup carries them into the attacker tcp_fingerprint snapshot; AttackerDetail's TCP STACK panel now surfaces DSCP and ECN cells.	2026-04-26 20:25:37 -04:00
anti	5b5ff54fa2	feat(web): orchestrator events read API + SSE stream GET /api/v1/orchestrator/events — paginated list with optional kind=traffic\|file filter. GET /api/v1/orchestrator/events/stream — SSE: snapshot on connect, live forward of orchestrator.> bus events mapped to 'traffic' / 'file' SSE event names. Repo gains list_orchestrator_events(limit, offset, kind?, since_ts?), count_orchestrator_events(kind?), and prune_orchestrator_events (per_dst_cap=10000) for periodic worker-side trimming.	2026-04-26 19:58:12 -04:00
anti	4c37ece39e	feat(orchestrator): MVP synthetic life-injection worker (SSH only) Adds a new decnet orchestrate worker whose job is to keep the honeypot ecosystem from looking suspiciously static — a frozen LAN with no inter-host traffic and no filesystem aging is its own honeypot tell. MVP scope: - New OrchestratorEvent table + repo methods (purpose-built sibling to Log so synthetic events stay separable from attacker-driven ones). - New orchestrator.{activity,file}.<decky_id> bus topics + system.orchestrator.health heartbeat. - SSH-only driver. Traffic action runs python3 inside src container to TCP-connect dst:22 and read the SSH banner — real on-the-wire SSH-protocol traffic without shipping creds. File action drops or refreshes a small file via docker exec on the destination. - Random scheduler (50/50 traffic/file when >=2 SSH-capable deckies are running). Diurnal shaping, role-aware pairing, and session-aware backoff are explicit non-goals for MVP. - CLI registration, systemd unit (SupplementaryGroups=docker), worker-registry entry so the dashboard shows orchestrator health. - 11 tests: scheduler policy, driver argv shape + injection-safety, end-to-end one-tick integration with FakeBus + SQLite.	2026-04-26 19:43:20 -04:00
anti	d531cea536	feat(web): read-only campaigns API + SSE + frontend API: /api/v1/campaigns (paginated list), /api/v1/campaigns/{uuid} (soft-merge chain follow), /api/v1/campaigns/{uuid}/identities (member identities), and /api/v1/campaigns/events (SSE under campaign.> + JWT-via-?token=, snapshot-on-connect). Mirror of the identity router; same auth, same shape, same OpenAPI tags pattern. Frontend: CampaignDetail.tsx page (same visual vocabulary as IdentityDetail), useCampaignStream hook (mirror of useIdentityStream), /campaigns/:id route, IdentityDetail's CAMPAIGN badge becomes clickable and navigates to the campaign. useIdentityStream now listens for identity.campaign.assigned so the badge appears live without a manual refresh.	2026-04-26 09:20:17 -04:00
anti	75af00c9c8	test(clustering): full-bound passes through production campaign clusterer Runs the chained identity + campaign clustering pipeline against all seven fixtures via from_synthetic / from_synthetic_identity adapters and ratchets every YAML floor to 1.0 — the production clusterer (and the reference clusterers used in the per-fixture tests) all score perfectly across ARI / homogeneity / completeness / singleton_recall on each fixture. Three substrate fixes surfaced by the ratchet: - Tuning: shared_infra now Jaccards payload+C2 only; decky_set moved into cohort_weight to prevent fleet-scarcity false-merges (F1's shared_wordlist failure mode). Tier weight raised to 1.0 so shared payload+C2 alone crosses threshold (F5's intended pass). - Adapter: from_synthetic_identity now reads SyntheticSession started_at + duration_s for session_windows and per-decky timestamps (the production-row adapter still uses start_ts/end_ts when available). - Fixture data: paused_campaign.yaml's JA3 collided exactly with vpn_hopping.yaml's (same TLS extension list). The collision fused two unrelated campaigns under the chained identity layer in the noise_floor composite. Made paused's JA3 distinct. Also wires Campaign / CampaignsResponse into models/__init__.py's __all__ that was missed in the schema commit.	2026-04-26 09:13:59 -04:00
anti	6936a1426c	feat(clustering): campaign-clusterer worker + bus topics + CLI The campaign clusterer worker mirrors the identity-side worker shell (bus connect, heartbeat, control listener, slow-tick fallback) but wakes on identity.> instead of attacker.> — campaign-level work is gated on identity-layer changes, not raw observations. The connected-components implementation reads identities via list_identities_for_clustering, projects them with from_identity_row, runs union-find over combined_campaign_weight, writes campaigns rows, sets attacker_identities.campaign_id, and runs the same revocable- merge pass as the identity layer (a merged-out campaign whose identities no longer co-cluster with the winner gets revoked). Bus: adds campaign.> family (formed / identity.assigned / merged / unmerged) plus the cross-family identity.campaign.assigned so existing identity-stream subscribers see the badge update without having to subscribe to campaign.>. Wiki Service-Bus.md updated in wiki-checkout in the same wave per the project's bus-signals discipline. CLI: decnet campaign-clusterer registered as master-only via MASTER_ONLY_COMMANDS; --poll-interval / --daemon mirror the identity clusterer command surface.	2026-04-26 09:04:00 -04:00
anti	0946bab424	feat(clustering): campaign-level similarity primitives The signal taxonomy for the campaign clusterer (next commit). Mirror of the identity-layer module but with edge families that don't translate 1:1: phase-handoff (load-bearing for F5 multi_operator — the signal the identity-side fingerprint-disagreement veto deliberately isn't), shared-infra (vetoed at identity level, primary positive signal here), temporal-overlap (pairwise-relative — F7 invariance preserved), cohort (weak supporting weight only). Tier weights tuned so phase-handoff alone crosses threshold (F5), shared-infra + temporal-overlap together cross (canonical co-op pattern), and shared-infra + cohort together do NOT (F1 shared_wordlist's failure mode). The F7 time-shift invariant is explicitly tested on every time-bearing edge and on the combined weight.	2026-04-26 08:57:46 -04:00
anti	0a1cf65ddb	feat(db): Campaign SQLModel + repo write/read methods Adds the campaigns table and the BaseRepository / SQLModelRepository methods that the campaign-clusterer worker (next commit) needs to populate it. Mirrors the AttackerIdentity layer: schema_version from day one for federation gossip, soft-merge via merged_into_uuid with a chain-walking get_campaign_by_uuid, list_campaigns excluding merged- out rows while list_all_campaigns returns the unfiltered set for the revoke pass. attacker_identities.campaign_id gets a real FK now that the target table exists.	2026-04-26 08:54:28 -04:00
anti	97aa57faed	feat(api): SSE stream for identity events at /api/v1/identities/events Mirrors GET /api/v1/topologies/{id}/events: subscribes to identity.> on the bus for the duration of the request and forwards each event as a named SSE frame (formed / observation.linked / merged / unmerged). The endpoint is broadly scoped (every identity event, not per-uuid) because both AttackerDetail and IdentityDetail need the same firehose: AttackerDetail watches for an identity.formed that finally binds its identity_id; IdentityDetail watches for observation.linked / merged / unmerged against its current row. A per-uuid filter would force the client to know its identity before subscribing, which it doesn't always. JWT via ?token= (EventSource can't set headers), require_stream_viewer gate, sse_connection_slot per-user cap, snapshot-on-connect with the first 50 identities so the client buffer renders without a separate REST call. Bus-disabled / unreachable path keeps the connection alive on keepalives so the client doesn't reconnect-storm; it can re-poll the REST API on its own timer.	2026-04-26 08:36:17 -04:00
anti	e364ef8859	feat(clustering): revocable merges (merge + unmerge) Reworks the clusterer's tick to handle multi-identity components and re-evaluate prior merges. Two passes per tick: Pass 1 — per-component reconciliation: * Fresh component → mint identity (commit 4 path). * Single-identity component → link unassigned observations. * Multi-identity component → soft-merge: pick the smallest-uuid winner deterministically, set merged_into_uuid on each loser, link unassigned observations to the winner. Observations stay FK'd to their original identity row — the merge is a soft pointer, not a re-point. Audit trail preserved; cached subscribers resolve through the chain. Pass 2 — revocable-merge undo: * For each merged-out identity, check whether its observations still cluster with its winner's. If not, the merge is contradicted by new evidence — clear merged_into_uuid and emit identities_unmerged. The resurrected identity keeps its original uuid, so subscribers that cached it during the merged interval re-attach without a new lookup. A pre-built merge-chain dict feeds Pass 1 so the effective-identity lookup is O(1) per observation. The chain has a hop cap (paranoia against accidental cycles in the underlying state). Repo additions on BaseRepository + SQLModelRepository: * list_all_identities() — includes merged-out rows. * update_identity_merged_into(uuid, winner_or_None) — single setter for both merge and unmerge. DummyRepo coverage stub updated. Tests: * Two distinct identities bridged by a new observation merge with the smaller uuid as winner. * A pre-seeded soft-merge whose underlying observations diverge gets revoked; resurrected uuid emerges with merged_into_uuid cleared. * Tick is idempotent under no state changes.	2026-04-26 08:33:32 -04:00
anti	87412da1ca	test(clustering): F6 noise-floor ratchets for production clusterer Two targeted invariants instead of a wholesale YAML-bounds re-use, because the existing F6 bounds were tuned for the reference composite_signals_clusterer (fingerprint OR C2). The production clusterer trades that aggregation for tier discipline + the fingerprint-disagreement veto, so its score profile differs even when its judgments are correct — multi_operator stays as 2 truth identities, paused_campaign's two DSL actors remain a single cluster because they share fingerprints, etc. Wholesale bounds re-use would fight the design. The two production-side ratchets: 1. singleton_recall ≥ 0.95 at campaign-level scoring — truth- singleton noise scanners must not be absorbed into real campaigns. This is the F6 failure mode that motivates the fixture. 2. Intra-campaign recovery under cross-corpus interference: * vpn_hopping's 5 rotations consolidate to one cluster. * shared_wordlist A and B stay in disjoint clusters despite sharing credentials with each other (and with the noise floor). A future commit can revisit when the production clusterer's identity- level truth alignment improves (e.g. when paused_campaign's DSL is extended to mark its two actors as one truth identity).	2026-04-26 08:28:31 -04:00
anti	7923006203	test(clustering): F7 slow-burn time-agnostic invariant Fixture 7 ratchet: one campaign across 3 multi-week operational windows with stable JA3 + HASSH + C2. The production clusterer must fold all 3 into one cluster despite multi-week silence between windows; completeness = 1.0. Time-shift invariance test: applying a +90 day delta to every session start (and the per-attacker first/last seen) must produce the same cluster membership as the baseline. This is the runtime counterpart of the static no-time-fields check on Observation. If either check ever fails, the clusterer has accidentally grown a recency-aware edge — fixture 7's whole reason for existing.	2026-04-26 08:26:23 -04:00
anti	6a4592a8f5	test(clustering): low/very-low tier safety + F1/F2 ratchets Pins down the tier-discipline contract end-to-end: - Credentials-only overlap doesn't fuse observations (F1 in miniature). - ASN-only overlap doesn't fuse observations (F2 in miniature). - All three weak tiers (medium + low + very-low) stacked still don't fuse — only a high-tier signal does. - F1 (shared_wordlist) at identity-level: no false merges, every row is its own predicted cluster, homogeneity = 1.0. - F2 (vpn_hopping): 5 distinct ASNs collapse into 1 predicted cluster, proving JA3 / HASSH dominate ASN as the design requires. The combination math itself was wired in commit 5; this commit is the failure-mode regression suite that gates future tuning of the tier weights.	2026-04-26 08:25:23 -04:00
anti	ed323581fe	feat(clustering): fingerprint-disagreement veto for fixture 5 Two operators cooperating on one campaign can share C2 endpoints + stage-1 payloads while running distinct tooling — fixture 5 (multi_operator) is the canonical demonstration. The identity clusterer must NOT fuse them: shared infra is a campaign-level signal, not an identity-level one. The campaign clusterer (downstream work) handles that grouping over identities. Mechanism: when two observations have non-null fingerprints AND the fingerprints fully disagree, the high-weight tier drops the payload and C2 contributions to zero. JA3 / HASSH agreement still returns 1.0 directly — no veto applies when something agrees. Partial agreement (one slot agrees, another disagrees) is treated as agreement, since stable-tool partial overlap is more consistent with one identity than two. The veto only triggers when there is actual disagreement evidence — two un-fingerprinted observations sharing a C2 still cluster, since the absence of fingerprints is not the same as disagreement on them. Fixture 5 production-clusterer assertion added at identity level: ARI = 1.0, homogeneity = 1.0, exactly 2 predicted clusters from 2 truth identities. Phase-handoff edges (from the TODO) belong to the downstream campaign clusterer, not this identity clusterer.	2026-04-26 08:24:22 -04:00
anti	f7da33726c	feat(clustering): combined edge weight + medium-tier wiring The clusterer now drops a single high-tier function call in favor of a tier-weighted sum. Tier multipliers (high=1.0, medium=0.6, low=0.2, very_low=0.05) are tuned so the threshold (1.0) admits high-tier agreement alone while leaving every weaker tier — and every combination of weaker tiers — under threshold. Per-tier discipline tested: - high alone clusters - medium alone does NOT cluster (supporting signal only) - low alone does NOT cluster (fixture 1's failure mode) - very-low alone does NOT cluster (fixture 2's failure mode) - all three weak tiers stacked still don't reach threshold - high + medium clusters (high already saturates) The combination is forward-compatible: low + very-low contributions are computed today but always project to 0.0 because the production adapter doesn't populate credentials / ASN-edge inputs into the fixture path yet. Their contribution becomes load-bearing in commit 7 when the low-tier landing tightens the F1 / F2 bounds. Fixture 4 (paused_campaign) ratchet added: high-tier signal carries the multi-day-silence campaign into one identity. Time-agnostic invariant — silence is irrelevant to the edge weight.	2026-04-26 08:22:10 -04:00
anti	de2f4c3a62	feat(clustering): wire high-weight edges end-to-end The connected-components clusterer now writes attacker_identities rows + sets attackers.identity_id when high-weight signals (JA3 / HASSH / payload-hash / C2-endpoint exact match) agree across observations. Singletons stay un-fingerprinted and un-clustered. Algorithm split: - cluster_observations(observations) — pure union-find over the high-weight edge function. Same code path for fixture validation and production tick. - from_attacker_row(row) — production-row adapter; recovers JA3 + HASSH from Attacker.fingerprints JSON. Payload + C2 join from logs in later commits; the function shape doesn't change. Repo additions on BaseRepository + SQLModelRepository: - list_attackers_for_clustering(limit=None) - create_attacker_identity(row) - set_attacker_identity_id(attacker_uuid, identity_uuid) DummyRepo coverage stub updated. v1 behavior is conservative: only assigns identities to observations whose identity_id is currently NULL. Multi-identity components are skipped this pass — merge / re-assign lands in commit 10 with revocable merges. Fixture bounds tightened against the production clusterer: - lone_wolf (F3) — singletons stay singletons - shared_wordlist (F1) — credential-only overlap doesn't cluster (high-weight tier doesn't include credentials) - vpn_hopping (F2, identity-level) — 5 rotated IPs with stable JA3 + HASSH fold into one identity, ARI = 1.0, completeness = 1.0	2026-04-26 08:19:56 -04:00
anti	a9775c4000	feat(clustering): similarity-graph primitives Adds the four weight-tier edge functions as pure, time-agnostic scoring primitives over an Observation projection. Each returns a score in [0, 1]; the connected-components impl will combine + threshold in subsequent commits. Tier semantics (from IDENTITY_RESOLUTION.md): - high — JA3/HASSH/payload-hash/C2-endpoint exact match - medium — phase-bucketed command-sequence Jaccard - low — credential-attempt-set Jaccard (defeated alone by F1) - very low — ASN equality (defeated alone by F2) Time-agnostic invariant is a static test: Observation has no time fields, so no edge function can silently start using them. Fixture 7 forbids recency-decay clustering on multi-month APT campaigns. A from_synthetic() adapter projects SyntheticAttacker corpora into Observation; the production-row adapter lands when the clusterer starts reading the attackers table.	2026-04-26 08:13:29 -04:00
anti	fb522af107	feat(bus): reserve identity.unmerged topic Revocable merges (a contradiction-driven undo of identity.merged) ship in the clusterer work; this reserves the topic up-front so identity.> subscribers receive it day one without a re-subscribe. The clusterer worker's ClusterResult fan-out now publishes on identity.unmerged when populated. The skeleton clusterer never populates it; the revocable-merge commit will. Wiki update lives in wiki-checkout/Service-Bus.md (separate repo).	2026-04-26 08:10:56 -04:00
anti	e545f7d8d3	feat(clustering): identity clusterer worker skeleton Adds the decnet clusterer master-only command + provider-subpackage shape (base.py + factory.py + impl/connected_components.py) so subsequent commits can land similarity-graph features without churning callers. The skeleton ConnectedComponentsClusterer.tick is a no-op; the worker shell is fully wired (bus consumer on attacker.observed + attacker.scored, slow-tick fallback, health heartbeat, control listener, ClusterResult fan-out to identity.formed/observation.linked /merged). Subscribers on identity.> see no events from this clusterer until edge functions land, but the lifecycle is in place.	2026-04-26 08:09:11 -04:00
anti	6b6a808a4a	test(clustering): fixture 7 slow_burn + recency_decay reference Multi-month APT campaign modeling real APT operational tempo: recon over weeks, exploitation later, action-on-objectives later still. The unique signal this fixture stresses is TIME-AGNOSTIC IDENTITY across multi-week silences — a clusterer that silently expires old edges fragments any campaign that operates over months. Three DSL actors represent the operator's three operational windows (week 2, month 2, month 3 of a 90-day campaign), all sharing JA3 + HASSH + payload + C2 callback. Campaign-level fixture only — the three actors mint distinct truth_identity_id rows by design (same modeling caveat as fixtures 4 and 5). The fixture's narrative mirrors how an APT works a deep nested topology (DECNET MazeNET mode): map decoy networks for weeks, only then commit to exploitation. Slow-and-low pacing is the signal. recency_decay_clusterer added to fixture_harness — same edge construction as composite_signals_clusterer, but each edge weighted by exp(-time_distance / half_life_days) and dropped below a threshold. Adversarial reference for slow_burn: with 14-day half- life and 0.5 threshold, edges between operational windows (24+ days apart) decay below threshold and drop. The campaign fragments into three clusters; completeness collapses. This is the canonical production failure mode for graph clusterers that bound memory or bias toward "what's hot" by silently expiring old edges. Catching it in synthetic data is what fixture 7 exists for; the replay tier will surface real-world drift / dwell patterns that calibrate the half-life threshold the real algorithm should tolerate. Four tests: corpus shape (window-isolated sessions, stable fingerprint), pipeline pass via composite_signals_clusterer (time- agnostic — folds all three windows), adversarial fragmentation (3 clusters at 14-day half-life), long-half-life sanity (gentle decay unions everything; confirms behavior depends on the half-life parameter, not on something unrelated).	2026-04-26 07:58:23 -04:00
anti	7021fda0e6	test(clustering): fixture 6 noise_floor (composite + cross-corpus) Bundles all five prior fixtures' campaigns into one corpus alongside 10 fresh Delivery-only noise scanners (on top of lone_wolf's 8 inherited). The fixture covers cross-corpus interference — signal collisions across fixtures' JA3/HASSH/C2 strings, factory ID re-use, clusterer ambiguity that only manifests when multiple campaigns score together. Each constituent fixture already ships its own in-fixture adversarial test; this one is the control for the class of failures that single-corpus fixtures cannot catch. Composition is declared via a fixture-6-specific include_fixtures block in noise_floor.yaml. The test file's loader expands it into a full corpus.campaigns spec at runtime so the factory itself stays unaware — no factory primitive added for what only this fixture needs. The 8 noise scanners declared by lone_wolf flow through naturally; the extra_noise_scanners count adds 10 more. composite_signals_clusterer (added in the fixture-5 commit) is the pass clusterer — union-find combining (ja3, hassh) match OR overlapping C2 callback. Approximates the planned similarity graph well enough that every campaign resolves and every singleton stays singleton in the merged corpus. Three tests: corpus integrity (every campaign id present, 12 campaign-driven attackers + 18 noise = 30 total), pipeline pass against the global bounds, and an explicit singleton-recall assertion (21 truth-singletons — 1 lone wolf, 18 noise, 2 shared_wordlist actors whose campaigns are size 1 — all kept singleton by the composite clusterer). Singleton recall is the load-bearing metric here: noise absorption is the failure mode that makes campaign attribution useless in practice.	2026-04-26 07:49:36 -04:00
anti	27f7de9886	test(clustering): fixture 5 multi_operator + c2/shift/composite refs Three new reference clusterers in fixture_harness: * c2_callback_clusterer — union-find on overlapping C2 callback sets across an attacker's sessions. Pass-clusterer for fixture 5 where two operators with distinct tooling share a C2 endpoint as the campaign signal. * shift_clusterer — deliberately-bad reference that buckets attackers by majority session-start hour into night/day/swing. Adversarial reference for fixture 5; proves operational schedule is NOT a campaign signal. * composite_signals_clusterer — union-find combining (ja3, hassh) match OR overlapping C2 callback. Will serve as the pass- clusterer for fixture 6 (noise_floor) where multiple campaigns with heterogeneous signal types are scored together. Also factored a small _union_find helper for the new clusterers (existing time_window/credential_jaccard left untouched to avoid mixing refactor with feature work). Fixture 5 (multi_operator): one campaign, two operators with distinct UKC roles. Actor A (broker, night shift): Delivery → Exploitation → Persistence → C2. Actor B (post-ex, day shift): Discovery → Lateral Movement → Collection → Exfiltration. Distinct JA3/HASSH/ASN/IPs; shared C2 + payload hash. Four tests: corpus shape (distinct fingerprints, shared C2, disjoint shifts), pipeline pass via c2_callback_clusterer, explicit harness sanity that fingerprint_clusterer cannot resolve this fixture (documents which signal carries the campaign), and adversarial shift_clusterer fragmentation. Phase-handoff edges (the real load-bearing signal per the design doc) wait for the production clusterer; this fixture will prove they're needed when it ships.	2026-04-26 07:46:14 -04:00
anti	304592abfe	test(clustering): fixture 4 paused_campaign + active_days/time_window Adds the actor.active_days primitive to the campaign factory so a DSL actor can be bound to specific day indexes. Falls back to the non-paused day pool when absent (existing fixtures unchanged). Intersects with pause_windows so the campaign-wide silence still wins if both are set. Adds time_window_clusterer reference to fixture_harness — union-find over attackers, edge if their session time-ranges are within gap_days of each other. Deliberately-bad reference for fixture 4: multi-day silent stretches fragment a single campaign because the clusterer has no signal that bridges the gap. Fixture 4 (paused_campaign): one campaign modeled as two DSL actors representing the operator's two operational windows (active days 1-2 and 6-7), separated by a silent stretch (days 3-5). Both share JA3 + HASSH + payload + C2 callback; only their active_days differ. Five tests: corpus shape (rows in their windows, shared signals), pipeline pass via fingerprint_clusterer at level=campaign, adversarial fragmentation via time_window_clusterer (1-day union threshold cannot bridge the 4-day silence → completeness collapses), huge-gap sanity (gap_days=10 unions both halves), silent-stretch invariant (no session leaks into the configured pause window). Identity-level scoring is fixture 2's job; this fixture is campaign-level only — modeling caveat documented in the YAML.	2026-04-26 07:39:46 -04:00
anti	0def6f7e37	test(clustering): fixture 2 vpn_hopping + fingerprint/asn references One campaign, one DSL actor, ip_pool: rotating + rotation_count: 5 across 5 synthetic private-use ASNs (RFC 6996 64512-64516). Stable JA3, HASSH, and payload_hash across every rotation — these are the "signals the attacker can't cheaply rotate" per IDENTITY_RESOLUTION.md and the load-bearing reason all 5 observation rows must resolve to one identity / one campaign. Two new reference clusterers in fixture_harness.py: * fingerprint_clusterer — groups by (ja3, hassh). Un-fingerprinted rows stay singleton so it doesn't trivially fuse all noise into one mega-cluster. Approximates the stable-signal arm of the planned similarity graph. * asn_clusterer — deliberately-bad reference for fixture 2's adversarial test. Group-by-ASN shatters the campaign into 5 singletons; completeness collapses to 0. Four tests in test_vpn_hopping_fixture.py: corpus shape (5 rows, 1 identity, 1 campaign, 5 distinct ASNs/IPs, stable fingerprints), pass at campaign level, pass at identity level (asserts ARI exactly 1.0), asn_clusterer breaches the completeness floor.	2026-04-26 07:34:18 -04:00

1 2 3 4 5 ...

397 Commits