Commit Graph

305 Commits

Author SHA1 Message Date
b754e9aa8b refactor(validate): move forwards_l3 overload explanation into check docstring
The 17-line block comment at _RULES was prose covering for a design wart.
The explanation belongs on the function itself — moved there and condensed.
_RULES now has a 2-line pointer instead of an essay.
2026-04-30 22:10:41 -04:00
402d6584ba fix(topology_store): use sqlite3.Row for named column access in current()
Row unpacking by positional index breaks silently on schema changes.
row_factory = sqlite3.Row gives named access with zero overhead.
2026-04-30 22:09:51 -04:00
9ad62d8177 fix(compose): name the topology_id prefix length constant
topology_id[:8] appeared twice with no explanation. 8 chars is the
git short-SHA convention; collision-safe within a single deployment's
network namespace.
2026-04-30 22:09:26 -04:00
eb7ccd0006 fix(reuse_worker): remove noqa: BLE001 (rule not in ruff select)
fix(generator): correct service pool count in _SVC_MIN/_SVC_MAX comment

BLE001 is not in ruff.toml select (F/ANN/RUF/E/W only); the suppressions
were whispering apologies to a linter that wasn't listening. Generator
comment now cites the actual ~28-entry non-singleton service pool.
2026-04-30 22:06:44 -04:00
17480093a9 refactor(topology_ops): decompose apply() into focused helpers
apply() was an 85-line function handling hash verification, validation,
superseding teardown, bridge/compose provisioning, and store persistence.
Extracted _check_hash_and_validate(), _teardown_superseded(), and _materialise()
so each step is independently readable and testable.
2026-04-30 21:56:48 -04:00
d1ed2701e7 refactor(generator): promote nested functions; rename used_combos to seen_service_pairs
_take_ip and _new_decky were closures capturing outer-scope state. Promoted to
module-level with explicit parameters. seen_service_pairs name makes the intent
clear — it prevents the same service frozenset from being assigned repeatedly.
2026-04-30 21:53:45 -04:00
07e6bafff8 fix(validate): narrow bare except to ImportError in psutil port-collision check
The original except Exception silently disabled port collision detection for
any runtime error — not just a missing package. Now only ImportError degrades
gracefully; real psutil failures propagate.
2026-04-30 21:53:05 -04:00
84e0ac4a43 fix(topology): cache IPAllocator host set; type repo params as BaseRepository
_host_set is computed once in __init__ — reserve() and is_free() were rebuilding
the full host frozenset on every call. BaseRepository already existed; the Any
annotations were just never updated.
2026-04-30 21:52:29 -04:00
257857338c fix(api): replace threading.Lock with asyncio.Lock for hydration guard
await inside a threading.Lock yields to the event loop while the OS
thread still holds the lock — potential deadlock under FastAPI thread
pool dispatch. asyncio.Lock is the correct primitive for async
critical sections. Also fixed stale diurnal.py docstring that had the
delegation direction backwards.
2026-04-30 21:24:11 -04:00
3fce597a70 docs(bodies): document intentional shared _body_canary in dispatch table 2026-04-30 21:19:07 -04:00
2629a8a0de fix(fake): rename prompt to _prompt, drop noqa suppression 2026-04-30 21:18:55 -04:00
a8c69155ff fix(planner): surface dropped weight entries in PUT /realism/config response
_parse_weights was silently dropping content_class values that don't
belong on their target list with no operator feedback. Changed it to
return (weights, dropped), apply_payload to collect and return all
dropped names, and put_config to include dropped_entries in the
response when non-empty.
2026-04-30 21:18:41 -04:00
8a40f6ced0 fix(personas_pool): re-stat after read to avoid caching stale mtime
The initial stat and read happened without a lock between them. A file
change mid-window stored the mtime of the pre-change stat against the
post-change content, suppressing the next reload. Re-stat after
read_text; fall back to the pre-read stat only on OSError.
2026-04-30 21:17:50 -04:00
1e1c92abc3 fix(bodies): type make_body_with_llm persona parameter via TYPE_CHECKING
The persona arg was typed Any to avoid a circular import. Added a
TYPE_CHECKING guard to import EmailPersona annotation-only so mypy
has the type without a runtime import cycle.
2026-04-30 21:17:26 -04:00
ebe15310ab fix(api): hydrate planner from DB exactly once on first GET, not on every read
get_config was calling planner.apply_payload on every GET request, racing
concurrent reads on module-level globals. Added a _hydrated flag + lock
so DB hydration runs at most once per process lifetime; put_config marks
it done too. Test fixture resets the flag between tests.
2026-04-30 21:17:03 -04:00
c7fcd86be4 fix(planner): guard apply_payload and reset_to_defaults with a lock
Concurrent PUT requests could observe a half-updated planner between
the four sequential global assignments. Added _planner_lock so the
rebind is atomic; same lock wraps reset_to_defaults.
2026-04-30 21:15:12 -04:00
f597d70430 fix(realism): use minute-precision datetime in in_active_hours
personas.in_active_hours was discarding the minute component of the
active-hours window, making "09:30-17:45" behave as "09:00-17:00".
Rewrote it to delegate to diurnal.in_work_hours (which uses full
minute arithmetic) and updated the scheduler caller to pass the full
datetime instead of now_dt.hour.
2026-04-30 21:14:36 -04:00
f6422f2529 fix(heartbeat): replace remaining bare except Exception with SQLAlchemyError and typed builtins 2026-04-30 21:08:26 -04:00
542d129d6f refactor(services_live): replace string-sniffed error dispatch with typed exception subclasses
ServiceNotFoundError (→ 404) and ServiceConflictError (→ 409) replace the
"not found" / "already on" / "not on" substring checks in _map_mutation_error;
base ServiceMutationError still maps to 422. Fixes three pre-existing test
status-code assertions (201 vs 200 on POST endpoints).
2026-04-30 20:49:29 -04:00
a5487eb55f refactor(enroll-bundle): extract bundle_builder and move DTOs to swarm models
Pure tarball construction (_build_tarball, _render_*, _iter_included,
_SYSTEMD_UNITS) moved to decnet/swarm/bundle_builder.py — no FastAPI
dependency, independently testable. EnrollBundleRequest/Response moved
to decnet/web/db/models/swarm.py alongside the other swarm DTOs.
Router drops from 504 to 260 lines; keeps only the in-memory token
registry, sweeper, and endpoints.
2026-04-30 20:39:42 -04:00
e124f9e296 refactor(swarm): extract _shard_payload helper and promote _dispatch to module-level 2026-04-30 20:25:38 -04:00
c648d8b04e fix(heartbeat): replace bare except Exception with specific types and intent comments 2026-04-30 20:19:52 -04:00
72498f81b2 fix(ui): surface attacker date_hdr in mail table and drawer
MailDrawer was reading fields.date / from_addr / message_id —
all wrong; actual log field names are date_hdr, from_hdr,
message_id_hdr, to_hdr.  The mail table in AttackerDetail
showed only DECNET capture time and used from_addr instead
of from_hdr.  Add a DATE (attacker) column so the attacker-
supplied Date header (including timezone) is visible at a
glance — useful for correlating campaigns like the Tiscali
run where IPs used distinct TZs (+0800 vs -0700).
2026-04-30 14:11:08 -04:00
d0b07bdf52 fix(smtp_relay): inject From: header if absent so attacker address shows in client
Relay-test scripts send minimal DATA with no headers. Without a From:
header the mail client falls back to displaying the envelope sender
(upstream_sender). Inject From: <attacker MAIL FROM> before forwarding
when the message has no existing From: header.
2026-04-30 12:43:41 -04:00
4d12fb6a03 fix(smtp_relay): upgrade to STARTTLS before AUTH if server advertises it
Servers like mail.resacachile.cl only expose AUTH after STARTTLS. Issue
starttls() + re-ehlo() when the server advertises the extension.
2026-04-30 12:40:17 -04:00
633594b110 fix(smtp_relay): use correct async-for bus subscription in probe listener
bus.subscribe() is sync and returns an async iterator, not a coroutine.
Awaiting it caused an immediate crash at startup; bus.next_message() does
not exist either. Rewrote _run_smtp_probe_listener to use the standard
pattern: sub = bus.subscribe(...) / async with sub / async for event in sub.
2026-04-30 12:35:45 -04:00
761c23a07c fix(smtp_relay): emit service=smtp_relay in syslog so ingester can gate probe publish
SERVICE_NAME was hardcoded to 'smtp' in server.py; the ingester's probe
publish guard checked service == 'smtp_relay' and never matched.

Read SMTP_SERVICE_NAME from env (default 'smtp'); smtp_relay compose
fragment sets it to 'smtp_relay' so the two services are distinguishable.
2026-04-30 12:31:29 -04:00
f0d47c5195 fix(smtp): chmod quarantine dir before dropping to logrelay
The bind-mounted quarantine dir is owned by the host decnet user; the
logrelay process had no write access because the Dockerfile USER directive
pre-applied before the entrypoint could fix permissions.

Run entrypoint as root, chmod 0777 the quarantine dir, then exec the
server under logrelay via su.
2026-04-30 12:25:37 -04:00
8ae7b9636e feat(smtp_relay): move probe forwarding to realism worker via bus
Attacker probe emails are now forwarded by the master (realism worker)
rather than inside the MACVLAN container, which has no internet gateway.

- New smtp.probe.pending bus topic: ingester publishes when smtp_relay
  message_stored fires; worker subscribes and does the actual delivery
- decnet/orchestrator/drivers/smtp_relay.py: pure-sync forward_probe()
  reads the .eml from disk and sends via smtplib on a thread executor
- worker.py: _run_smtp_probe_listener + _handle_probe_pending subtask;
  limit enforced via count_probe_relays() (DB-backed, restart-safe)
- bounties.py: count_probe_relays() query on probe_relay bounty type
- fleet.py: get_fleet_decky_by_name() to pull service config from DB
- services/smtp_relay.py: upstream_* and probe_limit fields defined in
  config_schema but NOT injected into container env (credentials stay
  out of docker env vars)
- ingester.py: stripped of smtplib; publishes probe.pending and exits
- tests: assert upstream keys absent from container environment
2026-04-30 12:10:58 -04:00
4c0a1309f0 fix(smtp_relay): log upstream error reason in probe_forwarded event
forwarded=0 was silent — now fwd_error carries the exception string so
you can see exactly why the upstream refused (auth failure, connection
refused, timeout, etc).
2026-04-30 11:57:07 -04:00
c78ba6f698 fix(deploy): pre-remove container by name before force-recreate
Docker Compose tracks the previous container by internal ID. When that
container was already removed or renamed, --force-recreate fails with
"No such container". Remove by name first so Compose always starts clean.
2026-04-30 11:54:00 -04:00
fdf38a9d8c feat(smtp_relay): add upstream_sender to fix SPF on probe forwarding
Override the envelope MAIL FROM with a domain we own when talking to the
upstream relay. SPF passes at the recipient; the attacker's From: header
inside the message body is untouched so they see their own address in their
inbox and believe the relay is real.
2026-04-30 11:47:18 -04:00
24cdef9246 feat(smtp_relay): ingest probe_forwarded as probe_relay bounty
Adds probe_forwarded to meaningful event kinds and stores it in the
bounty table as bounty_type=probe_relay with forwarded=true/false, so
the dashboard shows whether the upstream actually accepted the test email.
2026-04-30 11:32:14 -04:00
9a4fe2677b feat(smtp_relay): forward probe emails upstream so attackers verify relay works
First SMTP_PROBE_LIMIT messages per source IP are forwarded via a real
upstream relay (SMTP_UPSTREAM_HOST/PORT/USER/PASS) so the attacker's
test email actually lands in their inbox. All subsequent messages from
the same IP get 250 Ok but only hit the quarantine — campaign content
captured, nothing delivered.
2026-04-30 11:21:04 -04:00
4b7cb42ab1 fix(profiler): extract commands when MSGID=command, not just MSGID=NIL
The Dockerfile PROMPT_COMMAND logger uses --msgid command, so the MSGID
field arrives as 'command' not '-'. The CMD rewrite block was guarded by
event_type == '-' so it never fired, leaving fields['command'] unpopulated
and cmd_text=None for every SSH session command.

Broaden the guard to also match event_type == 'command' with no existing
'command' field, which covers both the intended (MSGID=NIL) and actual
(MSGID=command) wire formats.
2026-04-30 10:57:29 -04:00
bbb1762250 fix(export): one attacker per line in exported JSON 2026-04-30 10:45:03 -04:00
2ddba04f79 feat(attackers): add JSON export endpoint and download button 2026-04-30 10:43:46 -04:00
f0756dcdec fix(ui): use overflow: clip on dash panels so inner scrollbars aren't masked 2026-04-30 00:34:40 -04:00
18393f1e1c fix(ui): bound dashboard height so panels don't overflow viewport
.content-viewport is overflow-y: auto so flex:1 on dash-grid grew to
content height. Fix: dashboard uses height:100% instead of min-height,
and :has(>.dashboard) disables content-viewport scroll only on that
route — all other pages keep their normal scroll.
2026-04-30 00:32:16 -04:00
9ed0094045 fix(ui): reset live feed scroll to top on log update
Sticky thead was floating mid-content when the container auto-scrolled
as new log entries arrived. Pinning scrollTop to 0 on each logs update
keeps the thead at position 0 where it belongs.
2026-04-30 00:30:46 -04:00
fca0953439 fix(ui): dashboard grid fills available viewport height
Use flex: 1 on dash-grid instead of height: 480px so the panels
consume all remaining space below the stat cards; dash-side uses
height: 100% to fill its grid cell
2026-04-30 00:27:47 -04:00
b364c41736 fix(ui): dashboard panel heights + missing icon
- Use height: 480px on .dash-grid so both columns are the same height;
  side panels split that height via flex instead of their own max-height
- Add LayoutDashboard icon to the DASHBOARD page header
2026-04-30 00:24:27 -04:00
fbc9877ef2 fix(ui): follow-up polish — icons, dashboard bar, filter redesign, bounty/creds sort
- Dashboard: fix invisible bar at bottom of LIVE FEED by constraining
  max-height on the section instead of the inner container; same fix
  for side panels
- Page icons: add violet-accent icon beside h1 on all 9 missing pages
  (CanaryTokens, RealismConfig, SyntheticFiles, PersonaGeneration,
  Attackers, Webhooks, LiveLogs, Topologies, DecoyFleet)
- Attackers filter chips: replace ad-hoc chip buttons with seg-group
  tabs (ALL / ACTIVE N / PASSIVE N / INACTIVE N) matching Credential
  Vault style; country chips use same seg-group treatment
- Credential Vault: add sortable headers to REUSE tab (LAST SEEN,
  PRINCIPAL, KIND, TARGETS, ATTEMPTS); reuses same SortTh pattern
- Bounty: remove CREDENTIALS and PAYLOADS tabs; keep ALL, ARTIFACTS,
  FINGERPRINTS; add EMAIL (artifact subtype, filtered client-side)
2026-04-30 00:20:25 -04:00
9adee07d21 feat(ui): frontend polish sweep — 8 UX fixes
- DeckyFleet: card click opens inspect side-drawer instead of
  auto-filtering (localSearch filter behavior removed)
- Dashboard: LIVE FEED / DECKIES UNDER SIEGE / TOP ATTACKERS panels
  now have fixed max-height with overflow scroll instead of growing
- parseEventBody: defensive RFC 5424 header strip so raw syslog lines
  from the collector render as k=v pills instead of raw text
- Attackers: search placeholder updated; activity (Active/Passive/
  Inactive) and country chip filters added on top of existing IP search
- Credentials + Bounty: sortable column headers (click to asc/desc/clear)
- SwarmHosts + RemoteUpdates: icon extracted from <h1> into flex div
  with violet-accent class, matching site-wide Identities pattern
- Swarm.css: fix --panel-border undefined variable → --border so the
  title border-bottom line is visible on SwarmHosts and RemoteUpdates
2026-04-29 23:56:38 -04:00
a322d88b3c fix(tarpit): resolve topology container name in watcher before PID lookup 2026-04-29 21:14:21 -04:00
917f7e8e54 feat(tarpit): MazeNET topology-scoped tarpit — Inspector controls + topology API 2026-04-29 21:10:02 -04:00
f84c66cf9b feat(ui): tarpit controls on DeckyCard — three-dot dropdown + enable/disable 2026-04-29 20:56:51 -04:00
07b32e2abe fix(tests): patch add_service/remove_service at the router import, not the module
Monkeypatching services_live.add_service had no effect because api_services
already held a local reference to the name. Patch api_services.add_service
and update fake stubs to accept the config kwarg added to the real signature.
2026-04-29 18:50:21 -04:00
5f4005c47a feat(tarpit): port-selective tc netem tarpit mode with live log events
- GET/POST/DELETE /api/v1/deckies/{name}/tarpit (admin write, viewer GET)
- get_container_veth() + get_container_pid() in network.py via iflink/ip-link
- TarpitRule SQLModel table + TarpitMixin repo (upsert/get/delete/list)
- Background tarpit_watcher_worker: polls /proc/{pid}/net/tcp every 15s,
  emits tarpit_enter/tarpit_exit log events (edge-triggered, with duration)
- tarpit_enabled/tarpit_disabled logs on operator POST/DELETE actions
2026-04-29 18:49:42 -04:00
2fc5f1bdc5 feat(canary): auto-deregister fingerprint slug after first valid beacon
Once a fingerprint canary's HTTP beacon passes all 4 validation layers
and the trigger row lands, the token is immediately set to state=revoked
and canary.<id>.revoked is published on the bus. The slug lookup is
tightened to only return planted tokens, so subsequent requests to the
same URL silently return the transparent GIF without persisting anything
(stealth posture preserved). Plain http/dns canaries with no
fingerprint_nonce are not affected.

Changes:
- sqlmodel_repo/canary.py: add state == "planted" filter to
  get_canary_token_by_slug so revoked slugs resolve to None
- worker.py: after record_canary_trigger, if parsed_fp survived all
  layers and token has a fingerprint_nonce, call
  update_canary_token_state("revoked") + publish CANARY_REVOKED; errors
  are best-effort (trigger row already landed)
- test_worker_http.py: assert state=revoked in test_fp_valid_nonce_persists;
  new test_fp_deregisters_slug_after_valid_hit (second hit records nothing);
  new test_plain_http_canary_not_deregistered (env_file stays planted)
2026-04-29 17:49:31 -04:00