DECNET

Author	SHA1	Message	Date
anti	2cc60bd677	feat(realism): operator-tunable planner weights via realism_config New realism_config table (uuid PK + unique key) + two repo methods (get/set) backs an admin-only GET/PUT /api/v1/realism/config surface. The planner now exposes apply_payload(payload) / current_payload() / reset_to_defaults() and reads its weights through mutable module globals; pick() resolves the live values each call. Validation catches negative weights, zero totals, out-of-range canary_probability, unknown content_class names, and silently drops cross-list entries (canary class on the user list, etc). The orchestrator worker calls _refresh_realism_config(repo) on startup and every 5 ticks (~5min at 60s interval). Operator changes land within one refresh window with no bus signal — the simpler path for a knob whose latency tolerance is minutes.	2026-04-27 18:00:08 -04:00
anti	87cb61c8b2	feat(realism): synthetic-files browser API Adds GET /api/v1/realism/synthetic-files (paginated list, filters by decky_uuid, persona, content_class) and GET /api/v1/realism/synthetic-files/{uuid} (single row with last_body and a truncated:bool flag set when the stored body is at the 64KB cap). Repo gains count_synthetic_files() and get_synthetic_file(uuid). The list view drops last_body to keep the wire payload bounded; the detail endpoint is the only path that returns it. Read-only — orchestrator remains the sole writer.	2026-04-27 17:44:53 -04:00
anti	32eeb0c813	refactor(orchestrator): collapse decnet-emailgen.service into orchestrator Stage 5 of the realism migration. Email generation is no longer a separate worker / systemd unit / CLI subcommand — the orchestrator's single tick loop covers SSH traffic, file plants, and email drops. Going from 21 services to 20. Worker: - _one_tick rolls between traffic / file / email (45/45/10 weights). The 10% email weight at a 60s orchestrator interval produces ~one email per 10 minutes, close to the pre-collapse 5-minute cadence. - get_driver_for(action) (stage 4) handles SSH vs Email dispatch. - Quiet branches fall through so a (decky-set, persona-pool, mail-decky) shape that silences one branch doesn't waste the tick. - Periodic prune covers both orchestrator_events and orchestrator_emails tables. Deletions: - deploy/decnet-emailgen.service.j2 - decnet/orchestrator/emailgen/worker.py - decnet/cli/emailgen.py - tests/orchestrator/emailgen/test_worker_integration.py Renames (history-preserving): - decnet/web/router/emailgen/ -> decnet/web/router/realism/ - tests/api/emailgen/ -> tests/api/realism/ - tests/cli/test_emailgen_* -> tests/cli/test_realism_* Public surface changes (clean break, pre-v1): - API URL /api/v1/emailgen/personas -> /api/v1/realism/personas - CLI `decnet emailgen import-personas` -> `decnet realism import-personas`. `decnet emailgen run` is gone — the orchestrator covers it. - gating.py: emailgen master-only group replaced by realism. - decnet-orchestrator.service.j2: DECNET_REALISM_* env block added. - decnet.target: decnet-emailgen.service entry removed. - frontend: PersonaGeneration.tsx fetches /realism/personas.	2026-04-27 16:33:04 -04:00
anti	0b9873982d	refactor(realism): move emailgen LLM/personas/prompt into shared library Lift the format-agnostic pieces from decnet/orchestrator/emailgen/ into the new decnet/realism/ library so file-class content generation (stage 3 of the realism migration) can reuse them. Email-specific delivery (RFC 2822 EML, IMAP/POP3 spool, thread chains) stays in orchestrator/. Renames (history-preserving git mv): emailgen/personas.py -> realism/personas.py emailgen/prompt.py -> realism/prompts/email.py emailgen/global_pool.py -> realism/personas_pool.py emailgen/llm/ -> realism/llm/ Env-var clean break (pre-v1, no aliases): DECNET_EMAILGEN_LLM -> DECNET_REALISM_LLM DECNET_EMAILGEN_MODEL -> DECNET_REALISM_MODEL DECNET_EMAILGEN_TIMEOUT -> DECNET_REALISM_TIMEOUT DECNET_EMAILGEN_PERSONAS -> DECNET_REALISM_PERSONAS DECNET_EMAILGEN_FAKE_OUTPUT -> DECNET_REALISM_FAKE_OUTPUT Importers rewritten in: orchestrator/emailgen/scheduler.py, orchestrator/drivers/email.py, web/router/{emailgen,topology}/ api_personas.py, cli/emailgen.py. Tests for moved modules relocated to tests/realism/; tests for stay-put modules updated in place. API URL `/api/v1/emailgen/personas` and CLI `decnet emailgen import-personas` keep their public names until the service-collapse commit (stage 5).	2026-04-27 16:05:43 -04:00
anti	6c4ea706f8	feat(api): canary token CRUD router (/api/v1/canary) + tests Two sub-routers under /api/v1/canary: blobs (operator-uploaded artifacts, deduped by sha256): - POST /blobs (multipart upload; admin) - GET /blobs (list with token_count; admin) - DELETE /blobs/{uuid} (refcount-aware; 409 when referenced; admin) tokens (per-decky planted artifacts): - POST /tokens (generate or instrument + plant; admin) - GET /tokens?decky_name=&kind=&state= (filter; viewer) - GET /tokens/{uuid} (detail; viewer) - GET /tokens/{uuid}/preview (instrumented bytes; admin) - GET /tokens/{uuid}/triggers (paged callback log; viewer) - DELETE /tokens/{uuid} (revoke + bus event; admin) XOR validation: exactly one of blob_uuid / generator must be set. Path validation rejects relative/NUL/newlines/.. segments. Every body-bearing route documents 400 plus 401/403/404 as applicable. Stdlib MIME sniffer (no python-magic dep) covers PNG/JPEG/GIF/PDF/ HTML/XML/DOCX/XLSX/JSON/YAML/TOML/text/plain; everything else falls through to passthrough. Tests run end-to-end through the live FastAPI app (planter docker exec is patched); 17 cases covering dedup, refcount, lifecycle, XOR validation, path validation, and 404 paths.	2026-04-27 13:18:00 -04:00
anti	f046634d6e	feat(web): Persona Generation page under AUTOMATION New dashboard surface for editing the global emailgen persona pool — the JSON file fleet (MACVLAN/IPVLAN) and SWARM-shard mail deckies pull from. MazeNET topology personas are out of scope here; they're configured per-topology in the topology editor. Backend: * GET/PUT /api/v1/emailgen/personas — admin-write, viewer-read. PUT validates with the same Pydantic schema the worker uses (parse_personas), drops invalid entries with a warning, returns 400 only when the entire payload fails. Path is operator-discoverable on every response so a CLI-driven backup workflow stays visible. Frontend: * PersonaGeneration.tsx + .css — table + add/edit modal with the full EmailPersona schema (name, email, role, tone, mannerisms list, language, signature, active hours, reply latency, uses_llms_heavily). Local edits are batched; explicit "SAVE CHANGES" writes back, with a dirty-indicator pill and a "DISCARD" reset. Email uniqueness is enforced client-side so the scheduler never picks the same persona as both sender + recipient. * Sidebar AUTOMATION group gains a "Persona Generation" entry next to Orchestrator; route registered at /persona-generation. The worker reads the same on-disk file the API writes — see decnet.orchestrator.emailgen.global_pool. The API resets the in-process cache on every read/write so the worker picks up dashboard edits within its next tick rather than waiting on mtime.	2026-04-27 09:55:42 -04:00
anti	818aebadfc	feat(web): emailgen events in Orchestrator page The SSE pipe at /orchestrator/events/stream was already streaming 'orchestrator.email.{decky_uuid}' events (the subscription is for the 'orchestrator.>' wildcard), but the consumer side dropped them on the floor. Three fixes to close the loop: * useOrchestratorStream.ts now registers an 'email' SSE listener — the EventSource silently ignores frames whose event name has no listener, so missing this entry meant every email frame was dropped before reaching the page's onEvent handler. * /api/v1/orchestrator/events accepts kind=email and dispatches to list_orchestrator_emails, adapting rows to the existing wire shape: subject -> action, sender_email -> src_decky_uuid, recipient_email -> dst_decky_uuid, plus email-specific extras (thread_id, language, mail_decky_uuid, message_id, in_reply_to) ride along as top-level keys. * Orchestrator.tsx gains an 'email' tab in the kind filter and a branch in the row renderer / inspector that: - shows full sender / recipient (no UUID truncation), - chips the language code next to the subject, - relabels ACTION as SUBJECT in the inspector and surfaces thread / in-reply-to / mail-decky details. The 'all' tab continues to show traffic+file only (today's behavior); operators see emails by switching to the email tab. A union view at the API layer is the obvious follow-up but not necessary for now.	2026-04-26 22:56:48 -04:00
anti	c3518e3159	feat(workers): surface clusterer, campaign-clusterer, reconciler in panel The Workers panel (Config → Workers tab) hardcodes its row list in KNOWN_WORKERS — by design, so a rogue publisher can't inject UI rows. Three heartbeat-emitting workers were missing: * clusterer — behavioral clustering (decnet/clustering/) * campaign-clusterer — campaign assembly (decnet/clustering/campaign/) * reconciler — host-local fleet convergence (added in `430262e`) Each already publishes on system.<name>.health via run_health_heartbeat, so they show up live the moment they're added to the registry — no frontend or subscriber wiring needed (Config.tsx renders whatever /workers returns). Also added to _PREFERRED_ORDER in start-all so START ALL WORKERS brings them up in dependency-friendly order: data-plane → reconciler → intel → clustering → output → orchestrator. Three deployable units (listener, web, swarmctl) intentionally remain absent from KNOWN_WORKERS — they don't emit heartbeats (CLI / static server / one-shot tooling), so they'd permanently render as UNKNOWN and confuse operators. Adding them is a separate decision that needs a "synthesize installed-but-silent rows" pass on the registry.	2026-04-26 21:31:34 -04:00
anti	8814902999	docs(api): clarify fleet_deckies + JSON dual-write happens in engine.deployer The unihost API path delegates to engine.deployer.deploy(), which now writes both decnet-state.json (existing) and the fleet_deckies DB table (added in `646aeec`). Comment makes the single-sink design explicit so future maintainers don't add a parallel save_state / upsert_fleet_decky call here. No behavioral change — every fleet-creation path on every host (CLI deploy, this unihost API path, and per-worker SWARM agent deploys) already routes through the engine.deployer single sink.	2026-04-26 21:08:44 -04:00
anti	5b5ff54fa2	feat(web): orchestrator events read API + SSE stream GET /api/v1/orchestrator/events — paginated list with optional kind=traffic\|file filter. GET /api/v1/orchestrator/events/stream — SSE: snapshot on connect, live forward of orchestrator.> bus events mapped to 'traffic' / 'file' SSE event names. Repo gains list_orchestrator_events(limit, offset, kind?, since_ts?), count_orchestrator_events(kind?), and prune_orchestrator_events (per_dst_cap=10000) for periodic worker-side trimming.	2026-04-26 19:58:12 -04:00
anti	4c37ece39e	feat(orchestrator): MVP synthetic life-injection worker (SSH only) Adds a new decnet orchestrate worker whose job is to keep the honeypot ecosystem from looking suspiciously static — a frozen LAN with no inter-host traffic and no filesystem aging is its own honeypot tell. MVP scope: - New OrchestratorEvent table + repo methods (purpose-built sibling to Log so synthetic events stay separable from attacker-driven ones). - New orchestrator.{activity,file}.<decky_id> bus topics + system.orchestrator.health heartbeat. - SSH-only driver. Traffic action runs python3 inside src container to TCP-connect dst:22 and read the SSH banner — real on-the-wire SSH-protocol traffic without shipping creds. File action drops or refreshes a small file via docker exec on the destination. - Random scheduler (50/50 traffic/file when >=2 SSH-capable deckies are running). Diurnal shaping, role-aware pairing, and session-aware backoff are explicit non-goals for MVP. - CLI registration, systemd unit (SupplementaryGroups=docker), worker-registry entry so the dashboard shows orchestrator health. - 11 tests: scheduler policy, driver argv shape + injection-safety, end-to-end one-tick integration with FakeBus + SQLite.	2026-04-26 19:43:20 -04:00
anti	d531cea536	feat(web): read-only campaigns API + SSE + frontend API: /api/v1/campaigns (paginated list), /api/v1/campaigns/{uuid} (soft-merge chain follow), /api/v1/campaigns/{uuid}/identities (member identities), and /api/v1/campaigns/events (SSE under campaign.> + JWT-via-?token=, snapshot-on-connect). Mirror of the identity router; same auth, same shape, same OpenAPI tags pattern. Frontend: CampaignDetail.tsx page (same visual vocabulary as IdentityDetail), useCampaignStream hook (mirror of useIdentityStream), /campaigns/:id route, IdentityDetail's CAMPAIGN badge becomes clickable and navigates to the campaign. useIdentityStream now listens for identity.campaign.assigned so the badge appears live without a manual refresh.	2026-04-26 09:20:17 -04:00
anti	97aa57faed	feat(api): SSE stream for identity events at /api/v1/identities/events Mirrors GET /api/v1/topologies/{id}/events: subscribes to identity.> on the bus for the duration of the request and forwards each event as a named SSE frame (formed / observation.linked / merged / unmerged). The endpoint is broadly scoped (every identity event, not per-uuid) because both AttackerDetail and IdentityDetail need the same firehose: AttackerDetail watches for an identity.formed that finally binds its identity_id; IdentityDetail watches for observation.linked / merged / unmerged against its current row. A per-uuid filter would force the client to know its identity before subscribing, which it doesn't always. JWT via ?token= (EventSource can't set headers), require_stream_viewer gate, sse_connection_slot per-user cap, snapshot-on-connect with the first 50 identities so the client buffer renders without a separate REST call. Bus-disabled / unreachable path keeps the connection alive on keepalives so the client doesn't reconnect-storm; it can re-poll the REST API on its own timer.	2026-04-26 08:36:17 -04:00
anti	dc3d08dd41	feat(web): read-only /api/v1/identities/* endpoints + repo methods Second of the five-step identity-resolution substrate. Ships the API surface against the empty AttackerIdentity table from commit 1 — every endpoint returns empty/404 cleanly until the clusterer populates rows. Routes (auth-gated, viewer role): * GET /api/v1/identities — paginated list, excludes merged-out rows * GET /api/v1/identities/{uuid} — detail; transparently follows merged_into_uuid to surface the canonical winner * GET /api/v1/identities/{uuid}/observations — Attacker rows FK'd to the (resolved) identity uuid Repository (BaseRepository abstract + SQLModelRepository concrete): * get_identity_by_uuid (with merge-chain following, hop-bounded) * list_identities / count_identities (excluding merged-out) * list_observations_for_identity / count_observations_for_identity Tests: 12 new (empty-table behavior, seeded data, merge-chain resolution, repo-level smoke against real SQLite). Also fixes the pre-existing test_base_repo_coverage failure (DEBT-041 added abstract methods without updating the DummyRepo stub) — included here because this PR adds 5 more abstract methods, fixing it as a bonus. 474 db/web/profiler/correlation tests green.	2026-04-26 07:08:55 -04:00
anti	3eb67c9400	refactor(intel): re-key attacker_intel on attacker_uuid (closes DEBT-041) The threat-intel surface was IP-keyed on day one as an expedient — the worker is woken by IP-bearing bus events. ANTI's call: don't carry that debt. NO IPs as primary keys anywhere on the attacker-intel surface. Schema: - attacker_uuid is now the canonical key — UNIQUE + FK to attackers.uuid. - attacker_ip stays as a denormalised, indexed, NON-UNIQUE value column. Updated on every upsert; useful for SIEM payloads and audit lookups, but explicitly NOT a key. Model docstring says so. - Pre-v1, no Alembic migration needed. SQLModel.metadata.create_all() builds the new shape on fresh DBs. Repo: - upsert_attacker_intel now keys on attacker_uuid. - get_attacker_intel_by_ip → get_attacker_intel_by_uuid. - get_unenriched_attacker_ips → get_unenriched_attackers, returning [{uuid, ip}] tuples so the worker writes by UUID and dispatches provider calls by IP without a second round-trip. Worker: - _enrich_one(uuid, ip, ...) — UUID lands on the row, IP rides for provider egress. - attacker.intel.enriched bus payload gains attacker_uuid alongside attacker_ip — webhook → SIEM consumers benefit; no removal. API: - GET /api/v1/attackers/{ip}/intel deleted outright (rip-and-replace, never deployed beyond dev). - GET /api/v1/attackers/{uuid}/intel is the only public route, matching every other /attackers/* route. Frontend: - <IntelPanel uuid={id!} /> uses the URL param directly, fetches in parallel with the rest of AttackerDetail rather than waiting on attacker.ip. Tests: re-keyed in place, 39 passed (same coverage as before the refactor). Provider-impl tests untouched. DEBT-041: closed in DEBT.md (entry preserved as historical rationale, summary table flipped to ✅, remaining-open list shortened by one).	2026-04-26 05:35:29 -04:00
anti	8a6d632ab0	feat(deploy): systemd unit for decnet-enrich + register in worker panel Mirrors decnet-reuse-correlator.service.j2: same hardening posture (NoNewPrivileges, ProtectSystem=full, etc.), same restart policy, same log file convention. The decnet init renderer picks it up automatically via the decnet-*.service.j2 glob. Also reconciles a naming inconsistency I shipped earlier: the heartbeat name was 'intel' (the package) but the CLI command and unit are 'enrich' (the action). Renamed the heartbeat to 'enrich' so the workers panel displays the same string the operator types and the same string in the systemd unit file. Convention across the project: heartbeat name = registry key = unit basename = CLI command name. Registers 'enrich' in worker_registry.KNOWN_WORKERS and in the start-all preferred order. The decnet.target Wants= list also picks up the new unit so 'systemctl start decnet.target' brings everything up together.	2026-04-26 05:20:54 -04:00
anti	d3d9bd5aa7	feat(intel): `decnet enrich` CLI + GET /attackers/{ip}/intel endpoint CLI command mirrors the reuse-correlate shape (--poll-interval, --ttl-hours, --daemon). Run it under systemd as a sibling worker. The API endpoint returns the most recent cached row for an attacker IP or 404. Auth-gated via require_viewer like every other attacker route. Also extends the worker test with a real FakeBus so the attacker.intel.enriched publish path is exercised end-to-end (no longer a no-op against NullBus).	2026-04-26 05:17:25 -04:00
anti	a455248dd9	feat(deploy): systemd unit for decnet-reuse-correlator Adds the systemd template for the credential-reuse correlator daemon and wires it into decnet.target so `decnet init` installs it automatically (the unit installer globs decnet-*.service.j2). Mirrors the mutator template: bus-woken Type=simple service with the standard hardening + on-failure restart. Also registers `reuse-correlator` in the in-process worker registry (so the dashboard panel surfaces its heartbeat instead of dropping it as unknown) and slots it into the start-all preferred order between mutator and webhook.	2026-04-26 04:29:10 -04:00
anti	181c792753	feat(api): GET /credential-reuse list + detail endpoints Read-only routes for the credential-reuse findings produced by the correlator. Mirrors the /credentials route shape: JWT-gated via require_viewer, paginated with optional secret_kind / min_target_count filters, and a 404-on-missing detail route. No POST/PUT/PATCH (and no body parsing) so no 400 contract is documented.	2026-04-26 03:40:08 -04:00
anti	4566146d50	feat(api): GET /credentials endpoint Surfaces the Credential table (deduped attacker auth attempts) via a new /api/v1/credentials route. Mirrors the Bounty cache pattern (5s TTL on the unfiltered default page) and reuses the existing get_credentials / get_total_credentials repo methods + the already defined CredentialsResponse DTO. Filters: search, service, attacker_ip.	2026-04-25 07:51:20 -04:00
anti	ee176a6f79	Revert "feat(mazenet): per-LAN swarm host pin" This reverts commit `0d92170a57`.	2026-04-25 03:26:19 -04:00
anti	0d92170a57	feat(mazenet): per-LAN swarm host pin Adds nullable LAN.host_uuid (FK swarm_hosts.uuid). Resolution order when deploying a LAN: lan.host_uuid → topology.target_host_uuid → master. A LAN is one Docker bridge so the bridge cannot span hosts; this pin forces every decky in the LAN onto the named host. LANCreateRequest / LANUpdateRequest accept host_uuid; both validate that the host exists, returning 400 on unknown UUIDs. PATCH still gated by the existing pending-only guard, so reassignment of a live LAN is not yet possible (deferred to mutator support). LANRow surfaces the field so the frontend can render per-host badges.	2026-04-25 03:04:23 -04:00
anti	05d225ae38	fix(engine): surface CalledProcessError.stderr in deploy-failure log + status reason str(CalledProcessError) is just 'Command ... returned non-zero exit status N' — the stderr (where the buildx recovery hint lives) was being silently dropped from both the deploy log line and the persisted 'failed' status reason. New _format_subprocess_error helper appends .stderr when the exception is a CalledProcessError. Applied to transition_status reason and the background-deploy log message so operators and the UI see the real failure, not just the exit code. This is what makes the buildx preflight hint from `86b9dec` actually reach the user.	2026-04-24 19:31:37 -04:00
anti	c214cdd7bb	fix(api/topology): map duplicate-name IntegrityError to 409 POST /topologies raised a 500 with a raw SQLAlchemy IntegrityError traceback when the name collided with an existing topology. Catch the error at the router, verify it's the ix_topologies_name constraint (so unrelated integrity failures still surface as 500s with their real traceback), and return 409 with a helpful detail. Test covers the create-then-duplicate-create flow.	2026-04-24 19:06:37 -04:00
anti	f3408d5e62	fix(topology/allocator): widen default subnet base to /12 for mass-scale A 30-LAN generate request already fits in 172.20.0.0/16, but trees with depth/branching that multiply past 256 (e.g. depth=6, branching=4 ≈ 5k LANs) hit AllocatorExhausted before the first write. SubnetAllocator now accepts a full CIDR base ("172.16.0.0/12" → 4096 /24s) in addition to the legacy two-octet shorthand ("172.20", auto-lifted to /16). The parent must be ≤/24; a /24 base yields exactly one slot. Iteration order is preserved for /16 bases so existing topologies keep their third-octet sweep; /12 adds a second-octet dimension underneath. Defaults bumped to 172.16.0.0/12: TopologyConfig.subnet_base_prefix, /next-subnet query param, and the mutator's add-LAN fallback. The field pattern widens to accept CIDR. create-blank and manual LAN CRUD still use "10.0" (lifts to /16) — one DMZ LAN per topology, 256 is plenty.	2026-04-24 18:57:55 -04:00
anti	c78ab032bd	fix(xff): truncate LEAKED IPs + ROTATION badge for rotation attacks `for i in $(seq 1 100); do curl -H "X-Forwarded-For: 191.100.20.$i" ...` was dumping 100 distinct IPs into AttackerDetail's LEAKED IPs row, drowning the rest of the ORIGIN section. The 100-IP wall is itself a signal (WAF-bypass-list probing) that deserves a short badge, not a flood. Backend: - get_attacker_ip_leaks gains `limit: int = 10` parameter — caller only ever needs a sample, not the full set. - New count_attacker_ip_leaks() returns the unbounded COUNT(*) via one cheap SQL aggregate. - Detail endpoint returns {ip_leaks: [first 10], ip_leaks_total: N} so the UI can render a rotation badge independent of list length. UI: - New LeakedIPsRow component. First 5 distinct IPs rendered inline with hover tooltips (unchanged). When > 5, a `+ N more` expand button reveals the rest of the sample; when total exceeds the 10-row cap, a subtle `(+M beyond sample)` note appears. - When total ≥ 20, a red `ROTATION · N` tag renders leading the row with a tooltip explaining the semantic: "almost certainly XFF-rotation / WAF-bypass probing, not a real attribution leak." DB churn is deliberately not capped — 100k rows × ~500 B is tolerable. If it becomes a problem we can add an ingester-side count-and-skip; for now the UX fix is the whole story. Added test_ip_leaks_total_reported_separately_from_list asserting the endpoint shape matches what the UI consumes.	2026-04-24 18:25:46 -04:00
anti	2a0c5ca410	feat(attackers): XFF mismatch detection — attacker IP leak bounties Attackers routinely front their scanners with VPNs/proxies, so the TCP source we log is the proxy egress, not the real host. But a surprising number of attacker setups are misconfigured: the proxy forwards the real IP in an X-Forwarded-For (or Forwarded / X-Real-IP / CDN-variant) header. From our side that's a free attribution leak. New _detect_ip_leak extractor in decnet/web/ingester.py fires at ingest time per HTTP request. Logic: 1. Require service=http, source_ip present, headers present. 2. If source_ip ∈ DECNET_TRUSTED_PROXIES (comma-separated IPs or CIDRs) → legitimate reverse-proxy forwarding, skip. 3. Walk proxy-family headers in priority order: Forwarded (RFC 7239) → X-Forwarded-For → X-Real-IP → True-Client-IP → CF-Connecting-IP. 4. Extract the left-most parseable IP from the winning header. 5. If that IP differs from the TCP source → emit a bounty with bounty_type="ip_leak" carrying {source_ip, real_ip_claim, source_header, headers_seen, path, method}. Storage is the existing Bounty table — no schema change; de-dup is handled by Bounty's (attacker_ip, bounty_type, payload_hash) key, so repeat requests with the same leaked IP don't spam. AttackerDetail renders a warn-accent "LEAKED IPs:" row under ORIGIN listing distinct real_ip_claim values; hover tooltip shows the source header + path of the most recent leak. Only shown when at least one ip_leak bounty exists. RFC 7239 Forwarded parser handles the full vocabulary — bare IPv4, IPv4:port, quoted, IPv6 in brackets, IPv6 with port — returning only IPs that actually parse. Closes DEVELOPMENT.md "Network Topology Leakage → X-Forwarded-For mismatches". Phase 3 of the three-phase Attacker Intelligence series (phases 1: scanned-vs-interacted, 2: PTR records already shipped). DECNET_TRUSTED_PROXIES env shape matches THREAT_MODEL DA-08's "revisit when verified-proxy config lands" note — same token set future rate-limit work will consume.	2026-04-24 17:39:03 -04:00
anti	351a8939c3	feat(attackers): scanned vs. interacted service bucketing on detail page Adds a new card on AttackerDetail: SCANNED · N services \| INTERACTED WITH · M services. Distinguishes port-scanners (N high, M=0) from actual engagement (M>0) at a glance — the analyst's first question when triaging a new attacker row. Classifier lives in decnet/correlation/event_kinds.py, a single source of truth for the event-type vocabulary: - INTERACTION_EVENT_TYPES — command-family (command/exec/query/...), SMTP engagement (mail_from/rcpt_to/message_accepted), file/payload activity (file_captured/upload/download_attempt/retr), pub/sub (publish/subscribe), recorded TTY sessions. - NOISE_EVENT_TYPES — DECNET-internal (startup/shutdown/parse_error/ unknown_*). - Everything else defaults to scan. Conservative by design: new template verbs show up as "scanned" until explicitly promoted. Bucket logic: a service is "interacted" if ≥1 of its events classifies as interaction; otherwise "scanned" if ≥1 scan event; noise-only services drop. Disjoint by construction. Deliberate no-schema path: compute on-the-fly in the detail endpoint via SELECT DISTINCT service, event_type FROM logs. Small result set (tens of pairs per attacker), cost is trivial vs. the existing behavior/commands queries. Trade-off: one more DB round-trip per detail view in exchange for zero ALTER TABLE migration pain and immediate classifier-change feedback loop. Profiler's _COMMAND_EVENT_TYPES stays as-is (strict subset of interactions that carry executable text), with a comment pointing at the new canonical module. Closes DEVELOPMENT.md "Attacker Intelligence §Service-Level Behavioral Profiling — Services actively interacted with".	2026-04-24 17:12:20 -04:00
anti	f0ee6ff97e	feat(workers): enroll webhook worker in the Workers panel registry Add "webhook" to KNOWN_WORKERS + the start-all preferred order so the Config → Workers panel picks up the row automatically: heartbeat subscription, start/stop controls via the existing systemd helper (decnet-webhook.service.j2 already lands via decnet init's unit glob), and the status-dot lifecycle all come for free. Placed between mutator and the swarm-only agent/forwarder/updater trio — matches the intended startup sequence (bus → api → data-plane workers → egress → swarm management). No frontend change needed; Config.tsx reads the worker list dynamically from GET /api/v1/workers.	2026-04-24 16:34:14 -04:00
anti	2bcef50ac5	feat(webhooks): circuit breaker auto-disables misbehaving subscriptions After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed deliveries, the worker calls trip_webhook_circuit(uuid, ts) which flips enabled=False and stamps auto_disabled_at. The worker sets its reload flag so the next dispatch epoch stops consuming events for the tripped sub entirely — one dead receiver can't poison the shared egress pool anymore. Operator clears the trip via PATCH — setting enabled=True when the sub was previously disabled clears auto_disabled_at, zeros consecutive_failures, and clears last_error. Admin-pause → re-enable hits the same path harmlessly. Three observable states now distinguishable in the UI: - Active enabled=True, auto_disabled_at=NULL - Admin-paused enabled=False, auto_disabled_at=NULL - Tripped enabled=False, auto_disabled_at=<ts> UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and a "N TRIPPED" count in the page header. Hover tooltip tells the operator how to reset ("Re-enable via Edit"). record_webhook_failure now returns the new consecutive_failures count so the worker can compare against the threshold without a second roundtrip. trip_webhook_circuit is idempotent — re-tripping just re-stamps auto_disabled_at. Closes THREAT_MODEL WH-02 and DEBT-037 §1.	2026-04-24 16:24:33 -04:00
anti	b70845a85d	feat(webhooks): subscription CRUD + HMAC-signed delivery client Introduces the webhook egress foundation — a new WebhookSubscription table, admin-gated CRUD under /api/v1/webhooks, and the shared delivery client that both the test-ping route and the upcoming worker will use. No worker yet; this commit is API + model + client only. Simple-mode enum (AttackerDetail / DeckyStatus / SystemStatus) expands to bus-topic patterns at the router layer; storage is always the raw pattern list. Advanced mode lets admins supply raw NATS-style patterns directly. Filter-at-subscribe: the worker (next commit) will subscribe to the union of patterns across enabled subscriptions. Delivery client handles HMAC-SHA256 signing (X-DECNET-Signature), retry on 429/5xx/network errors with jittered backoff, no-retry on 4xx. Secrets never leave the server on GET/LIST — only the create response carries the secret for copy-out. CRUD routes publish WEBHOOK_SUBSCRIPTIONS_CHANGED on the bus after every mutation so the (future) worker can hot-reload. Opens DEBT-037 for the deferred items (circuit breaker, dead-letter, batch delivery, payload templates, secret-at-rest).	2026-04-24 15:30:05 -04:00
anti	162f7c1194	feat(api/sse): per-user connection cap + viewer-safe invariant New decnet/web/sse_limits.py provides sse_connection_slot, an async context manager that counts live SSE connections per user UUID and raises 429 when a per-user cap is exceeded (default 5, override via DECNET_SSE_MAX_PER_USER). Wired into both SSE generators as their first async with, so the cap check fires before any stream data is yielded. The cap must sit inside the generator — StreamingResponse returns before the generator body runs, so a handler-level wrapper would release the slot immediately. Put prefetch + slot + loop all under the one async with. Also documents F6/I (role leakage) as mitigated-by-construction via handler docstrings: every event type on both streams wraps data already reachable via viewer-gated REST, so no per-event filter is needed until a new event family is introduced. The invariant is written into the handler docstrings so a future PR can't silently add admin-only events. Resolves THREAT_MODEL F6/I and F6/D.	2026-04-24 15:01:20 -04:00
anti	df84981954	feat(api): pin response_model on dict-returning mutation routes Every mutation route that returned an untyped dict now declares response_model at the decorator. MessageResponse covers the eight {"message": ...} envelopes (change-password, mutate-decky, mutate- interval, update-deployment-limit, update-global-mutation-interval, delete-user, update-user-role, reset-user-password). Purpose-built models cover the richer shapes (DeployResponse for /deckies/deploy, PurgeResponse for /config/reinit, ReapReportResponse for /reap-orphans, UserResponse for /config/users). 204-No-Content and Response/ ORJSONResponse routes stay as-is. The wire shape for clients is unchanged — the envelopes already only shipped a message field. What changes is that a handler which accidentally returns a richer dict (e.g. a full user row including password_hash) would be silently stripped to the declared fields at serialization time. Also flips F4/D "expensive LIKE" to accepted (new DA-09) — the /logs and /attackers search routes LIKE-scan unbounded columns, but both are admin-gated, limit-capped, and operator rate-limit scope per DA-04. FTS5 stays a performance TODO, not a security blocker.	2026-04-24 14:27:58 -04:00
anti	a935bf2663	feat(api): cap offset on list-topologies and transcript endpoints The other five query endpoints (/logs, /attackers, /attacker-commands, /bounties, /topologies/{id}) already declared le=2147483647 on offset; these two were inconsistently uncapped. Bring them in line to close the F4/D deep-pagination row. Also resolves F4/T (ORM sort injection — already mitigated by the regex pattern on /attackers sort_by, no other route accepts a column name) and F4/D (limit cap — already universal) with code pointers.	2026-04-24 14:14:25 -04:00
anti	99ccd41bb5	feat(api/artifacts): explicit Content-Disposition + X-Content-Type-Options Harden the attacker-controlled artifact download path (F7) with explicit response headers instead of relying on Starlette's defaults (which only emit attachment for non-ASCII filenames and never set nosniff). Also resolves the THREAT_MODEL F7 path-traversal row (containment check was already in _resolve_artifact_path) and the fleet-deploy detail=str(e) audit (all four sites are admin-gated deliberate validator UX or structured worker-response fields).	2026-04-24 13:24:34 -04:00
anti	323077b383	fix(web/transcripts): fall back to shard-scan when Log row has no shard_path sessrec.c emits the session_recorded SD blob with sid/service/src_ip/ duration_s/bytes/truncated — it never emitted shard_path. The web handler still asked for fields.shard_path, got "", tripped the sessions-YYYY-MM-DD.jsonl basename regex and returned 400 "invalid shard name" for every legitimate transcript request. Handler now: - Fast-paths when fields.shard_path IS present and validates (for any future emitter or ingester that backfills it). - Otherwise enumerates sessions-YYYY-MM-DD.jsonl shards under ARTIFACTS_ROOT/{decky}/{service}/transcripts/ (newest first) and returns the first one whose per-sid index contains our sid. - Security invariant preserved: only files whose basename matches the _SHARD_BASENAME_RE are ever opened, and they always resolve inside ARTIFACTS_ROOT. A forged fields.shard_path is silently ignored. - Soft-fails OSError/PermissionError on the transcripts dir (decky containers often write it with a uid the API can't read) — returns 404 instead of a 500 traceback. test_forged_shard_path_blocked updated to match the new semantics: forgery is ignored, the real shard is served via fallback. The invariant (no /etc/passwd access) is still asserted by the fact that status is 200 with data from the test shard.	2026-04-24 01:18:40 -04:00
anti	0eb0b32c7a	refactor(swarm): enroll bundle switches from exclude list to include list Exclude lists fail open — anything new at the master's repo root (venvs, logs, dev notes, .env.local, local DB dumps) silently leaks into every agent bundle. On this box a stray .311 venv (335 MB) + logs/ (220 MB) bloated the tarball to ~150 MB and blew test_enroll_bundle timeouts. Replace _EXCLUDES + _is_excluded with _INCLUDED_ROOT_FILES + _INCLUDED_DIRS + _EXCLUDED_DECNET_SUBTREES and iterate via os.walk with in-place dirnames[:] pruning so master-only subtrees (decnet/web, decnet/mutator, decnet/profiler) and __pycache__ aren't descended into at all. Bundle contents are now strictly: pyproject.toml + the decnet/ package minus the three master-only subtrees. Synthetic entries (INI, certs, systemd units) unchanged — they were always added inline, not from the tree walk. test_enroll_bundle.py: 20/20 pass in 24s (was timing out at 15s/test).	2026-04-23 21:47:47 -04:00
anti	2f4f81e5de	feat(api): rate-limit /auth/login + scaffold threat model Adds slowapi two-bucket rate limit on /auth/login — 10 attempts per 5 minutes per-IP AND per-username, tripping either → 429. Per-IP catches botnets hitting one account; per-username catches distributed credential stuffing against one account. In-memory storage: dashboard API is single-process, Redis is disproportionate for v1. X-Forwarded-For is deliberately NOT trusted (spoofable); reverse-proxy deployments get one shared bucket per proxy IP. Logged in the threat model as accepted risk DA-08, to be revisited when a verified-proxy config lands. Also scaffolds development/THREAT_MODEL.md with STRIDE-per-element methodology, system-context DFD, and Dashboard↔API as the first fully worked component (7 sub-flows, ~50 threat entries). F1 Authn ships with 3 threats mitigated: rate limit (new), uniform 401 (verified already in place), bcrypt length clamp (verified already in place via Pydantic max_length=72).	2026-04-23 13:25:28 -04:00
anti	8cbb7834ef	feat(web): SMTP victim-domain + stored-mail panels on attacker detail Adds GET /attackers/{uuid}/smtp-targets (viewer) and GET /attackers/{uuid}/mail (admin) endpoints, plus two new sections on the attacker detail page: VICTIM DOMAINS rollup (aggregate-only, federation-gossip-safe) and STORED MAIL with a drawer that decodes headers, lists attachments, and downloads the raw .eml via the existing artifact endpoint (?service=smtp).	2026-04-22 22:33:53 -04:00
anti	c50448995b	feat(smtp): capture full messages + attachments to disk SMTP template now writes each accepted DATA body as a .eml file into a bind-mounted per-decky quarantine dir and emits a `message_stored` log with sha256, size, decoded headers, and an attachment manifest (filename + sha256 + size + content-type). Attachment hashing uses the decoded payload so operators can match against VT / MalwareBazaar directly. Body accumulator is capped at SMTP_MAX_BODY_BYTES (default 10 MB, matching the EHLO SIZE advert) so a streaming client can't OOM the container. The existing /api/v1/artifacts/{decky}/{stored_as} endpoint now takes an optional ?service= query param (defaults to ssh for back-compat) and can serve .eml files out of the smtp subdir. Forensic metadata rides the normal log pipeline, same as SSH file_captured.	2026-04-22 22:17:50 -04:00
anti	6f537f52c2	fix(topology): remove DMZ gateway auto-attach on LAN create POST /topologies/{id}/lans previously called _auto_attach_gateway() whenever a non-DMZ LAN was created, which wired the DMZ gateway decky to every new subnet. That's why a deployed gateway ended up with eth0..ethN on every LAN regardless of what the user drew in MazeNET. Drop the auto-attach helper entirely. The DMZ_ORPHAN deploy-time validator (decnet/topology/validate.py:65-110) stays strict — users must explicitly wire the gateway to each subnet they want bridged, which is the whole point of having a topology editor. useMazeApi.ts: drop stale auto-bridge reference from comment.	2026-04-22 17:14:09 -04:00
anti	13ea916943	feat(workers): add start + start-all endpoints (systemd supervisor) POST /api/v1/workers/{name}/start — 202 on acceptance, 404 unknown worker, 503 if the unit file is not installed, 502 if systemctl returns non-zero (stderr snippet in detail, full stack logged). Admin only. POST /api/v1/workers/start-all — best-effort: walks the worker list in dependency order (bus → api → data-plane), skips already-active and uninstalled units, aggregates outcomes into {started, already_running, failed[]}. Returns 200 even on partial failure; the caller reads the three lists. Both endpoints delegate to the systemd_control helper, so the attack surface for "what gets executed" is locked to `decnet-<validated-name> .service` at two layers (router KNOWN_WORKERS + helper regex).	2026-04-22 14:12:29 -04:00
anti	0fbb07c2ec	feat(workers): bus-backed Workers panel (registry, control, installed flag) Ships the backend half of Config → Workers: * Worker registry aggregates `system..health` + `system.bus.health` heartbeats into a last-seen dict; OK / STALE / UNKNOWN tiers drop out of a 90s window (3× the 30s heartbeat interval). `GET /api/v1/workers` returns the snapshot plus `bus_connected` (so the UI can explain "all UNKNOWN" when the bus socket is down) and a per-row `installed` flag populated from `systemctl list-unit-files decnet-.service` (cached 30s). `POST /api/v1/workers/{name}/stop` publishes a stop intent on `system.<name>.control`; workers listen via the shared control listener in `bus/publish.py`. * Heartbeat + control listener wired into collector / profiler / sniffer / prober / mutator worker loops. API self-heartbeats too so the panel always has one ground-truth row. * Topic helper `system_control(name)` + tests covering builder validation, control listener shutdown path, and the API surface (auth gating, bus-connected field, unknown-name 404). Adds `StartFailure` / `StartAllResponse` models in anticipation of the upcoming start endpoints (DEBT-034).	2026-04-22 14:10:39 -04:00
anti	6725197d58	test(web): transcripts API + attacker-transcripts router coverage Paging, truncation surfacing, admin gate, path traversal, sid-regex and decky-mismatch rejection for /transcripts; mirror coverage for /attackers/{uuid}/transcripts. Flips the Session Recording box in the roadmap (sessrec pty relay now shipping end-to-end).	2026-04-21 23:11:40 -04:00
anti	6e522c5a55	feat(web): transcripts API + repository lookups Adds get_attacker_transcripts (mirror of artifacts for session_recorded logs) and get_session_log for sid→shard resolution. New /api/v1/transcripts/{decky}/{sid}?offset=&limit= pages asciinema events out of the shared JSONL day-shard via an mtime-keyed byte-offset index — never scans the whole shard per request. New /api/v1/attackers/{uuid}/transcripts lists sessions for drilldown. Both endpoints admin-gated.	2026-04-21 23:06:39 -04:00
anti	8f25ff677f	feat(engine,api): add orphan topology resource reaper Topology rows deleted without a proper teardown leave Docker containers and bridge networks behind, holding IPAM pools that cause 403 "Pool overlaps" on the next deploy at the same subnet. - engine/reaper.py walks the local Docker daemon, extracts the 8-char topology prefix from every decnet_t_* resource, and force-removes containers + networks whose prefix is not in the repo. - POST /api/v1/topologies/reap-orphans (admin-only) returns a report of live/orphan prefixes and what was removed. - Resources belonging to live topologies are never touched; per-resource errors are captured without aborting the sweep.	2026-04-21 22:13:44 -04:00
anti	f611e7363b	feat(mutator,web): live topology mutation pipeline backend (DEBT-030) Wire the mutator and web API into the service bus so live-topology edits flow sub-second from enqueue to UI: - Mutator publishes every state transition on the bus (mutation.applying /applied/failed + topology.status). Fire-and-forget; DB stays source of truth. - Mutator watch loop subscribes to topology.*.mutation.enqueued and wakes early via asyncio.Event — the 10s poll becomes a fallback heartbeat, not the primary dispatch trigger. - POST /topologies/{id}/mutations publishes mutation.enqueued after the DB write succeeds. - New GET /topologies/{id}/events SSE route: snapshot on connect (status + in-flight mutations), live forwards topology.{id}.> bus events, 15s keepalive. ?token= auth mirrors /stream. - New decnet/bus/app.py — process-wide lazy bus singleton for the API, closed cleanly on lifespan shutdown.	2026-04-21 14:38:25 -04:00
anti	071312fc0c	feat(web/api): expose archetype catalog endpoint /api/v1/topologies/archetypes returns the archetype registry (slug, display name, description, preferred services/distros, nmap_os fingerprint) so the frontend wizard can render a live catalog instead of hardcoding a copy.	2026-04-21 10:24:01 -04:00
anti	e8f9c955b3	feat(swarm): heartbeat-driven topology resync for agent-pinned deployments Agent heartbeats now carry an applied-topology snapshot. The master heartbeat handler compares the reported version_hash against what canonical_hash yields for the hydrated topology pinned to that host and flags Topology.needs_resync on divergence (or when the agent reports no topology at all while master expects one). The mutator watch loop gains reconcile_agent_resyncs, which re-pushes the current hydrated blob via AgentClient.apply_topology without touching status, then clears the flag on success. Push failures leave the flag set so the next tick retries.	2026-04-21 01:35:12 -04:00
anti	5a0cf5d7c8	feat(topology): add target_host_uuid to pin topologies to swarm agents Adds the `target_host_uuid` FK on `Topology` plus wiring through the two create endpoints (`POST /topologies`, `POST /topologies/blank`). Validates the mode/host pair: `mode='agent'` now requires a known, routable host; `mode='unihost'` must leave the field unset. Surfaced on `TopologySummary` so list/detail responses expose it. Purely additive at the schema level — existing unihost flows unchanged (field defaults to `NULL`). Step 1 of the agent <-> topology integration.	2026-04-21 01:19:45 -04:00

1 2 3

131 Commits