Commit Graph

608 Commits

Author SHA1 Message Date
215251a122 fix(deploy): template --group on the bus ExecStart too
decnet-bus.service.j2 ran with User={{ user }} / Group={{ group }}
but the actual bus CLI invocation hardcoded --group decnet. The bus
chowns /run/decnet/bus.sock to that group at 0660 — so when an
operator ran `decnet init --group anti`, the socket ended up
owned by decnet:decnet while every worker (agent, api, collector,
forwarder, prober, updater) ran as anti and got EACCES on connect().

Each worker's bus-wiring catches the error, logs a warning, sets
bus=None, and carries on — which is correct for the data-plane but
silently kills Workers-panel heartbeats (run_health_heartbeat(None,
...) no-ops). So half the worker grid showed UNKNOWN even though
systemctl confirmed the processes were alive.

Swap the hardcoded --group decnet for --group {{ group }} so the
socket is owned by the same group the workers run under.
2026-04-24 01:09:55 -04:00
e4ccf30133 fix(init): template the polkit rule on --group too
polkit rule 50-decnet-workers.rules hardcoded isInGroup("decnet"),
so when 'decnet init --group anti' installed systemd units as
User=anti / Group=anti, the API (running as anti) could no longer
systemctl start/stop decnet-*.service — polkit fell back to
'interactive authentication required', which in a daemon context is
a hard fail:

  START FAILED · COLLECTOR — Failed to start decnet-collector.service:
  Access denied as the requested operation requires interactive
  authentication.

Rename the rule to .j2, parameterise the group on {{ group }}, and
route _install_polkit through _render_template /
_write_rendered_if_changed. Now the polkit rule matches whatever
group was passed to 'decnet init'.

Test fixture updated to seed the .j2 variant.
2026-04-24 01:07:16 -04:00
08436433ef fix(deploy): relocate StandardOutput/StandardError below multi-line ExecStart
Four templates use backslash line-continuation on ExecStart
(decnet-bus, decnet-forwarder, decnet-listener, decnet-updater). My
earlier sed inserted StandardOutput= and StandardError= right after
the first ExecStart= line, which split the command and systemd fed
those two lines back to the binary as extra positional arguments —
the bus in particular crashed with:

  Got unexpected extra argument
  (StandardOutput=append:/var/log/decnet/decnet.bus.log)

Walk the ExecStart block (follow \-continuation lines) and insert
the two Standard* directives AFTER the last continuation line. The
nine single-line ExecStart templates are unaffected in shape but
re-written through the same path to keep the whole set uniform.
2026-04-24 01:03:58 -04:00
311da4098e fix(logging): don't crash the process if the system log can't be opened
_configure_logging opened InodeAwareRotatingFileHandler against
DECNET_SYSTEM_LOGS (default: relative decnet.system.log) without
guarding OSError. Under systemd with ProtectSystem=full +
ProtectHome=read-only and no writable path baked into the unit, the
first import of decnet.config raised OSError and the daemon died
before it could even print a useful error — the root-cause log line
showed up in journalctl as a stack trace rather than a warning.

Wrap the handler attachment in try/except OSError and log a single
WARNING via the already-installed stream handler. stderr is always
attached, so losing the file handler means operators tail
journalctl / docker logs instead — the daemon keeps running.
2026-04-24 01:00:42 -04:00
d4b714dc39 fix(deploy): wire per-unit log files on master systemd services
The agent-side enroll-bundle templates (decnet/web/templates/*) always
set DECNET_SYSTEM_LOGS + StandardOutput/StandardError to a per-unit
file under /var/log/decnet. The master-side init templates (deploy/*)
never did, so every 'decnet init'-installed service:

- inherited the default DECNET_SYSTEM_LOGS=decnet.system.log — a
  relative path, landing in the unit's WorkingDirectory. All 13 units
  shared the same cwd and fought for the same file, or more often
  just failed to write it under ProtectSystem=full,
- emitted stdout/stderr to the journal by default, which is fine for
  uvicorn's INFO banter but makes per-service grepping a pain when
  you're chasing a single worker's trace.

Mirror the agent-side wiring on all 13 master templates:
- Environment=DECNET_SYSTEM_LOGS=/var/log/decnet/decnet.<name>.log
- StandardOutput=append:/var/log/decnet/decnet.<name>.log
- StandardError=append:/var/log/decnet/decnet.<name>.log

/var/log/decnet is already in ReadWritePaths so ProtectSystem=full
stays compatible. Operators now get a dedicated
/var/log/decnet/decnet.<unit>.log per service, both from the app's
structured logger and from any stray stderr — journalctl still
works too, but no longer the only option.
2026-04-24 00:57:23 -04:00
c282f74bd4 fix(web/dashboard): wrap long kv-chips instead of blowing out the EVENT column
Key:value chips in the live-feed event cell used the default .chip
style, which is white-space: nowrap + inline-flex. A long cmd: value
(attacker-controlled shell strings, URLs, base64 payloads) stretched
the chip horizontally past the column, pushing the whole table into
horizontal scroll and clipping subsequent columns off-screen.

Add a chip-kv variant that allows the value to wrap inside a
max-width: 100% chip (word-break: break-word, overflow-wrap: anywhere
for dense strings with no natural break). The key-label stays on the
first line via flex-shrink: 0. Short values (uid: 0, user: root)
stay tight; long ones wrap onto multiple lines inside the chip.

Also set minWidth: 0 on the EVENT td + nested flex containers so
flex children honour the column width instead of growing to fit
content. Added title={k: v} on each chip for full-value hover in
case the wrap is still clipped.
2026-04-24 00:51:31 -04:00
bfff212a05 fix(api): gate embedded Docker-log collector on DECNET_EMBED_COLLECTOR
The API lifespan unconditionally spawned log_collector_worker,
appending every container line to DECNET_INGEST_LOG_FILE. On hosts
that also run decnet-collector.service (installed by 'decnet init')
that's two tailers writing the same events to the same file — the
ingester then inserts each event twice and the dashboard shows every
command duplicated.

Add DECNET_EMBED_COLLECTOR (default false), matching the existing
DECNET_EMBED_PROFILER and DECNET_EMBED_SNIFFER pattern directly
above this block. Single-process dev setups without systemd can flip
it on to restore the all-in-one behaviour; multi-process production
gets the single-writer invariant by default.
2026-04-24 00:47:37 -04:00
edc8297af3 fix(init): gate userdel/groupdel on --purge to avoid nuking the operator
Every plain `decnet deinit` ran userdel + groupdel unconditionally. In
dev the operator may pass `--user $USER --group $USER` to avoid file
ownership churn against a source checkout — at which point deinit
would cheerfully delete their own login account.

Move user/group removal behind --purge, matching the existing
behaviour for /var/lib/decnet + /var/log/decnet. Help text updated:
--purge now clearly advertises that it also wipes the service
user/group, with an explicit warning to only run it when `decnet init`
created the account in the first place.

Test updated: plain --deinit must NOT invoke userdel/groupdel;
--deinit --purge must.
2026-04-24 00:38:51 -04:00
38832d87d5 fix(init): thread --user / --group through systemd unit templates
Every decnet-*.service.j2 hardcoded User=decnet / Group=decnet. The
init CLI accepted --user / --group and used them for useradd,
chown, /etc/decnet ownership and ReadWritePaths — but the Jinja
context omitted them entirely, so

  sudo decnet init --install-dir $PWD --user anti --group anti

rendered

  User=decnet
  Group=decnet

into every unit, which at best ran the workers as a user that didn't
match the files (fails to read the venv / config), and at worst spun
a parallel system user the operator never asked for.

Swap the hardcoded lines to {{ user }} / {{ group }} across all 13
templates and add both to the Jinja context in _install_units.
2026-04-24 00:36:23 -04:00
51012eaa67 feat(init): decouple venv from install_dir; fail loud if no venv exists
The systemd unit templates hardcoded {{ install_dir }}/venv/bin/decnet.
On production hosts enroll_bootstrap.sh creates exactly that path so it
worked. On dev boxes where the operator runs `sudo decnet init` against
a source checkout with a differently-named venv (.venv, .311, .312),
every decnet-*.service looped forever in auto-restart with:

  Failed at step EXEC spawning .../venv/bin/decnet: No such file or
  directory

Templates now use {{ venv_dir }} as an independent Jinja2 var. `decnet
init` adds --venv-dir (explicit override), otherwise autodetects:

  1. $VIRTUAL_ENV (only when inside --install-dir, so a user-home venv
     never gets baked into a root-owned unit),
  2. {install_dir}/venv (production default; what enroll_bootstrap
     creates),
  3. {install_dir}/{.venv,.311,.312,.313} (common dev conventions).

Init aborts before any file writes if nothing resolves — an
operator-friendly error beats journalctl spam on every unit restart.

python3-venv doesn't set a persistent system variable — $VIRTUAL_ENV
lives in the activated shell only — so this has to be decided + baked
in at init time; there's no way for systemd to "inherit the current
venv" at unit start.

Test mode (--prefix) skips venv validation so the existing test suite
doesn't need to stub up a venv tree per case.
2026-04-24 00:29:49 -04:00
cb692d570a feat(cli): status queries systemd for every decnet-* unit
'decnet status' used to psutil-scan for cmdlines matching hand-coded
service launch args. That worked on dev boxes running workers via
'python -m decnet.cli ...' but missed the systemd reality on real
hosts: units may be installed but not started, failed, or in
auto-restart — all invisible to a cmdline grep.

New behaviour: status calls `systemctl list-units --type=service --all
--output=json 'decnet-*.service'` and renders the unit/load/active/
sub/description matrix. One view works for masters, agents, and
mixed hosts — iterates over whatever 'decnet-*' units were installed
by 'decnet init' / the enroll-bundle. Agent/master mode filtering is
no longer needed in the CLI; the host literally does not have
master-only units installed if it enrolled as an agent.

The psutil path survives as a fallback for boxes without systemd
(dev laptops, CI containers, minimal init systems) so the command
stays useful there. Clearly labelled 'psutil fallback' in the table
title so operators know which view they're looking at.
2026-04-24 00:23:00 -04:00
d61e143b71 fix(stress): unblock Locust runs from login rate-limit self-DoS
Locust spawns N virtual users (default 1000), all from 127.0.0.1 as
admin. /auth/login is rate-limited 10/5min per-IP AND per-username, so
the 11th on_start() got 429 and a RuntimeError. A @task(2) login in
the task weights turned the whole run into a 429 factory even after
ramp-up. And _login_with_retry treated 429 as non-retryable, so there
was no graceful degradation path.

Three changes, one root cause:

- decnet/web/limiter.py: read DECNET_LIMITER_ENABLED (default true).
  When false, slowapi's Limiter(enabled=False) makes @limiter.limit a
  no-op. Default ships unchanged; nobody should ever release with this
  off.
- tests/stress/conftest.py: set DECNET_LIMITER_ENABLED=false in the
  uvicorn subprocess env. Stress tests measure throughput, not rate
  limiting.
- tests/stress/locustfile.py: drop the @task(2) login — it added zero
  coverage (every user already logs in at on_start) and only generated
  contention. Teach _login_with_retry to honour 429 + Retry-After so a
  Locust pointed at a limiter-enabled server degrades gracefully
  instead of crashing on_start.
2026-04-24 00:13:15 -04:00
ae92948e22 test(live): align mqtt/postgres/mysql live tests with honeypot + loop realities
Three unrelated test-correctness fixes exposed by running tests/live:

- test_mqtt_live: honeypot defaults to auth-required (post-2018
  realistic broker). Anonymous CONNECT is rejected with CONNACK rc=5,
  which the "accept" / "subscribe" tests misread as a failure. Pass
  MQTT_ACCEPT_ALL=1 via a new env= override on the live_service factory
  so only those two tests opt into accept-all.
- test_postgres_live::test_auth_hash_logged: connected with
  dbname='prod', which isn't in the honeypot's per-instance DB list, so
  Postgres (correctly) rejected at startup before asking for a
  password — blowing past the auth event the test asserts on. Target
  'postgres' (always in _BASE_DBS) to reach the auth stage.
- test_mysql_backend_live: the module-scoped mysql_test_db_url fixture
  is bound to the module loop, but function-scoped tests default to
  their own per-function loops. Any reuse of the asyncmy pool then
  tripped "Future attached to a different loop". Pin the whole module
  with pytest.mark.asyncio(loop_scope='module').
2026-04-23 22:06:55 -04:00
26d04d5eb8 fix(db): SessionProfile.kd_digraph_simhash must be BINARY(8), not BLOB
MySQL can't index a BLOB/TEXT column without a prefix length, so
create_all() on a fresh MySQL schema blew up with "BLOB/TEXT column
'kd_digraph_simhash' used in key specification without a key length".

SimHashes are a fixed 8 bytes — the variable-length type was a
SQLAlchemy-side auto-mapping from 'Optional[bytes]', not an actual
schema requirement. Switch to BINARY(8), which is portable: MySQL gets
a fixed-width indexable BINARY, SQLite treats it as BLOB and doesn't
care about key length.
2026-04-23 22:06:38 -04:00
f0b0967b16 chore(gitignore): ignore dev-host noise (.311 venv, wiki clone, scratch logs)
- .311/ and .3[0-9][0-9]/ + .venv*/ — cpython-version-suffixed venvs
  (common convention) now covered alongside the existing .venv/.
- wiki-checkout/ — local nested clone of the wiki; never a submodule.
- hang.log / schem / *.pytest.log — scratch dumps from saved pytest
  output redirections.
- deps.txt — pydeps-style dependency graph from local analysis runs.

No tracked files affected; just stops new working-tree noise from
showing up in git status.
2026-04-23 21:52:40 -04:00
a6356abe27 docs(dev): post-v1 roadmap + check off shipped "Commands executed" item
- DEVELOPMENT_V2.md (new): post-v1 direction. Everything here is after
  the v1 box is closed — federation, advanced behavioral profiling,
  maze-scale topology work.
- DEVELOPMENT.md: flip "Commands executed" checkbox — full per-session
  command log already landed in the profiler's _extract_commands_from
  _events path.
2026-04-23 21:52:15 -04:00
6cbf8de6a8 docs: signal-capture audit + /api/v1 route audit
- SIGNAL_CAPTURE_AUDIT.md: end-to-end walkthrough of what attacker
  signals DECNET captures at each pipeline stage, where the gaps are
  (session profile ingestion, keystroke dynamics), and what ships for
  v1 vs what lands post-v1.
- api-audit.md: FastAPI /api/v1 route audit — surface area, auth
  requirements, status-code coverage, and where schema drift would bite
  the schemathesis suite.

Both are operator/engineering reference docs, not user-facing.
2026-04-23 21:52:05 -04:00
2ff392511b feat(templates): per-instance stealth seeding helper
Adds instance_seed.py to every service template (conpot, docker_api,
imap, k8s, llmnr, pop3, rdp, sip, smb, snmp, ssh, telnet, tftp, vnc).

Derives a stable per-instance seed from NODE_NAME (+ optional
INSTANCE_ID) and exposes deterministic helpers for the boring details
scanners would otherwise use to fingerprint the whole fleet as one
machine: cluster UUIDs, auth salts, uptime fixtures, minor version
strings. Connection-time jitter is intentionally NOT seeded — two hits
to the same decky must not replay the same latency curve.

Identical source across every template; lives next to each service so
the Docker build context picks it up without a shared package-data hop.
2026-04-23 21:51:51 -04:00
0eb0b32c7a refactor(swarm): enroll bundle switches from exclude list to include list
Exclude lists fail open — anything new at the master's repo root (venvs,
logs, dev notes, .env.local, local DB dumps) silently leaks into every
agent bundle. On this box a stray .311 venv (335 MB) + logs/ (220 MB)
bloated the tarball to ~150 MB and blew test_enroll_bundle timeouts.

Replace _EXCLUDES + _is_excluded with _INCLUDED_ROOT_FILES +
_INCLUDED_DIRS + _EXCLUDED_DECNET_SUBTREES and iterate via os.walk with
in-place dirnames[:] pruning so master-only subtrees (decnet/web,
decnet/mutator, decnet/profiler) and __pycache__ aren't descended into
at all.

Bundle contents are now strictly: pyproject.toml + the decnet/ package
minus the three master-only subtrees. Synthetic entries (INI, certs,
systemd units) unchanged — they were always added inline, not from the
tree walk.

test_enroll_bundle.py: 20/20 pass in 24s (was timing out at 15s/test).
2026-04-23 21:47:47 -04:00
ea95a009df refactor(tests): move flat tests/*.py into per-subsystem subfolders
Groups every flat test_*.py under the module it exercises, matching the
existing tests/{profiler,sniffer,prober,collector,correlation,cli,web,
topology,swarm,bus,updater,api,docker,geoip,...} layout. New folders:
services/, fleet/, config/, logging/, db/ (+ db/mysql/), telemetry/,
mutator/, core/.

Path-dependent __file__ references bumped an extra .parent in three
files that moved one level deeper:
- tests/sniffer/test_sniffer_ja3.py   (template path)
- tests/services/test_ssh_capture_emit.py (template path)
- tests/cli/test_mode_gating.py  (REPO root)
- tests/web/test_env_lazy_jwt.py (repo var)

Also drops two SQLite runtime artifacts (test_decnet.db-{shm,wal}) that
were leaking into the repo from a previous test run.

Fixes two test_service_isolation cases that patched asyncio.sleep (no
longer on the profiler main-loop hot path — same pre-existing bug I
fixed earlier in test_attacker_worker.py) by patching asyncio.wait_for
and passing interval=0.
2026-04-23 21:34:25 -04:00
21e6820714 feat(web/attackers): surface GeoIP country on list cards + detail page
- Attackers list: small country-code chip next to the IP on each card,
  title-tooltip shows the source (e.g. "rir")
- AttackerDetail: country-code tag next to the IP in the header plus an
  ORIGIN field in the TIMELINE section for always-visible origin
- TypeScript interfaces updated with country_code/country_source
2026-04-23 21:21:21 -04:00
1854f9de28 fix(tests): profiler worker tests patched asyncio.sleep, but main loop uses wait_for
Since the event-driven shutdown refactor (0fbb07c), the profiler main
loop is asyncio.wait_for(shutdown.wait(), timeout=interval) — no sleep
on the hot path. The four worker tests that patched asyncio.sleep to
raise CancelledError on the Nth call were silently no-op'ing and
hanging on the real 30 s wait_for timeout.

Replace the sleep patches with a shared _cancel_after helper that
patches wait_for itself. Pass interval=0 so the loop ticks without
delay between iterations.
2026-04-23 21:14:45 -04:00
ffc275f051 feat(geoip): country-code enrichment via RIR delegated-stats
Populates Attacker.country_code + country_source (MVP) using the five
RIR delegated-stats files (ARIN/RIPE/APNIC/LACNIC/AFRINIC). Offline,
license-free, no outbound traffic that could burn honeypot stealth.

- decnet.geoip package with factory/base/lookup + rir/ subpackage
  (fetch/parse/provider) mirroring the db + bus factory convention
- Profiler._build_record calls enrich_ip on every upsert
- Idempotent ALTER TABLE migrations for both SQLite and MySQL
- decnet geoip refresh/lookup CLI (master-only)
- /var/lib/decnet/geoip seeded by decnet init
- DECNET_GEOIP_ENABLED=false kill-switch; set in tests/conftest.py so
  unit tests never trigger the first-access fetch
2026-04-23 21:12:38 -04:00
07bf3dc8cb feat(config): promote /etc/decnet/decnet.ini to real config with domain sections
The config file `decnet init` dropped at /etc/decnet/config.ini was a
stub with a single [decnet] header saying 'reserved for future structured
settings.' Admins who wanted to tune DECNET_API_HOST, DECNET_DB_URL,
DECNET_BATCH_SIZE, etc. had to hunt env.py for the exact variable name
and drop it in .env.local.

Changes:
- decnet/config_ini.py — adds a _DOMAIN_MAP translation table covering
  [api], [web], [database], [bus], [swarm], [logging], [ingester],
  [tracing]. Loads regardless of mode; unknown keys inside a known
  section log a WARNING (operator typos shouldn't be silent).
  Explicit key map (not auto kebab-to-snake) so [web] admin-user lands
  in DECNET_ADMIN_USER without silently renaming the env-var contract
  consumers import from decnet.env.
- decnet/cli/init.py — renames the placeholder target config.ini →
  decnet.ini (unifies with the name already used by load_ini_config and
  the enroll bundle's _render_decnet_ini). Placeholder body now shows
  every domain section as a commented example so admins learn the
  shape by reading. Deinit removes both decnet.ini and the legacy
  config.ini so upgrading hosts leave no orphan file.

Precedence is unchanged: real env > INI > built-in default in env.py.
os.environ.setdefault means systemd EnvironmentFile= and one-off
DECNET_FOO=bar decnet ... invocations always win.

Secrets explicitly NOT moved to the INI:
 - DECNET_JWT_SECRET
 - DECNET_ADMIN_PASSWORD
 - DECNET_DB_PASSWORD
They stay in .env.local / EnvironmentFile= — never in a group-readable
INI, never in a diff, never on the dashboard.

Dev/profiling flags (DECNET_DEVELOPER, DECNET_EMBED_*, DECNET_PROFILE_*)
also stay env-only per maintainer direction — dev knobs shouldn't
be one 'I'll flip this for tonight' away.

Tests: +5 in test_config_ini.py (domain sections load regardless of mode,
env beats INI for domain keys, unknown key warns, absent section is
no-op, role section beats domain section via setdefault precedence). +1
in test_init.py (placeholder writes decnet.ini with every section
header present as commented guidance).

31 tests pass across the two files (was 26).
2026-04-23 18:21:00 -04:00
1753eca198 feat(deploy): templatize systemd services on install_dir via Jinja2
Distros reserve /opt for different things (some package managers own it
outright), and a DECNET install that wants to live at /srv/decnet or
/usr/local/decnet had to hand-edit 13 service files post-install.

Converts every deploy/decnet-*.service to a .j2 template keyed on
{{ install_dir }}, rendered by `decnet init` at install time. All other
paths (log_dir, state_dir, runtime_dir, user, group) stay standard —
only install_dir varies.

Changes:
- deploy/decnet-*.service → deploy/decnet-*.service.j2 (13 files).
- decnet init gains --install-dir (default /opt/decnet, preserves
  existing behaviour byte-for-byte). Validates absolute-path at the
  CLI boundary. Threads through useradd --home-dir and the dir-creation
  list so the filesystem layout matches the rendered templates.
- _install_units renders via Jinja2 with StrictUndefined (typo → loud
  error, not a silent broken unit). SHA over rendered output so
  operators with a custom install_dir get idempotent re-runs.
- decnet.target, tmpfiles.d, polkit rule stay static — they don't
  reference install paths.
- 4 new tests: custom install_dir renders into units, default remains
  /opt/decnet, relative paths rejected, second run with same custom
  dir is idempotent.
2026-04-23 18:08:26 -04:00
4418608a54 fix(bus): silently drop publishes on closed bus instead of raising
Worker bus instances (collector, ingester) close their private buses
in finally blocks on shutdown, but stream threads holding closure
references kept calling publish after close — one `RuntimeError:
publish on closed bus` per stream line, caught by publish_safely
and logged per call, flooding server logs.

Changes:
- `UnixSocketBus.publish()` now drops post-close calls. First drop
  WARNs loudly (bus is critical infra — silent drops would hide real
  problems); subsequent drops on the same instance log at DEBUG to
  prevent the flood. Sticky `_closed_publish_warned` flag, reset
  naturally per new bus instance.
- `make_thread_safe_publisher` short-circuits on a closed bus before
  marshalling a coroutine onto the loop. Avoids the wasted scheduling
  work in the hot shutdown path.

Degradation is safe: callers go through `publish_safely`, which
already treats exceptions as 'dropped notification, DB is source of
truth.' We just stop manufacturing the exception in the first place
for a known-benign condition.
2026-04-23 18:00:47 -04:00
eb2308d9e1 fix(bus): retry app-bus connect with backoff instead of one-shot veto
A startup race between `decnet bus` being ready and the API's lifespan
hitting `get_app_bus()` at api.py:135 would set `_tried = True`
permanently, poisoning the singleton for the rest of the process: the
dashboard shows BUS OFFLINE, topology SSE falls into the bus-is-None
snapshot-only branch, mutator publish calls no-op. Only an API
restart recovered.

Replaces the one-shot veto with a time-gated retry keyed on a
`_last_failure_ts` monotonic timestamp plus a 2 s backoff. Publishers
on the hot path still pay at most one connect attempt every 2 s when
the bus is down, but the singleton auto-recovers within 5 s (one
dashboard poll) once the bus comes up.

The asyncio lock still serialises concurrent callers so the bus server
doesn't get stampeded with parallel connect attempts on startup.
2026-04-23 17:59:17 -04:00
ef4179ea1f feat(api): opaque 500 handler + error_id correlation for unhandled exceptions
Registers a generic @app.exception_handler(Exception) that catches anything
uncaught in route handlers / dependencies. Prod response is opaque:
{detail: 'Internal Server Error', error_id: <uuid4 hex>}. Dev mode
(DECNET_DEVELOPER=True) adds exception_type and traceback fields so
failures are debuggable without tailing server logs.

The error_id is logged alongside the full traceback server-side, letting
operators correlate a user's 500 report with the exact exception via
`grep <error_id> /var/log/decnet.log`.

FastAPI's own HTTPException routing and the existing
RequestValidationError / ValidationError / RateLimitExceeded handlers
still take precedence — this handler only fires on genuinely-uncaught
exceptions.

Flips threat model F1/I 'traceback / stack trace leakage' from ? to M
and logs a follow-up checklist entry for 4 detail=str(e) sites in the
fleet deploy router (admin-gated, different threat class, separate
audit).
2026-04-23 14:07:32 -04:00
2f4f81e5de feat(api): rate-limit /auth/login + scaffold threat model
Adds slowapi two-bucket rate limit on /auth/login — 10 attempts per
5 minutes per-IP AND per-username, tripping either → 429. Per-IP
catches botnets hitting one account; per-username catches distributed
credential stuffing against one account. In-memory storage: dashboard
API is single-process, Redis is disproportionate for v1.

X-Forwarded-For is deliberately NOT trusted (spoofable); reverse-proxy
deployments get one shared bucket per proxy IP. Logged in the threat
model as accepted risk DA-08, to be revisited when a verified-proxy
config lands.

Also scaffolds development/THREAT_MODEL.md with STRIDE-per-element
methodology, system-context DFD, and Dashboard↔API as the first fully
worked component (7 sub-flows, ~50 threat entries). F1 Authn ships
with 3 threats mitigated: rate limit (new), uniform 401 (verified
already in place), bcrypt length clamp (verified already in place via
Pydantic max_length=72).
2026-04-23 13:25:28 -04:00
8cbb7834ef feat(web): SMTP victim-domain + stored-mail panels on attacker detail
Adds GET /attackers/{uuid}/smtp-targets (viewer) and GET /attackers/{uuid}/mail
(admin) endpoints, plus two new sections on the attacker detail page:
VICTIM DOMAINS rollup (aggregate-only, federation-gossip-safe) and STORED MAIL
with a drawer that decodes headers, lists attachments, and downloads the raw
.eml via the existing artifact endpoint (?service=smtp).
2026-04-22 22:33:53 -04:00
d43303251d feat(profiler): track SMTP victim domains per attacker
New SmtpTarget table records each (attacker, domain) pair observed via
the SMTP honeypots. Only the domain is stored — local-parts are dropped
at ingestion, so this table holds no user-identifying data beyond the
target organisation's identity.

The profiler worker extracts domains from rcpt_to / rcpt_denied /
message_accepted events, normalizes them (lowercase, strip local-part,
drop blocked TLDs), and upserts one row per pair with a running count +
first_seen / last_seen.

Three repo methods shipped:
  * increment_smtp_target(attacker, domain) — upsert + bump
  * list_smtp_targets(attacker) — per-attacker view
  * smtp_target_seen(domain) — cross-attacker aggregate, shaped as the
    federation-gossip RPC that V2 will expose.

The gossip-query shape is load-bearing: each operator can answer
"have any of your attackers targeted corp1.com?" without leaking
which attackers or when — the aggregate returns a bool + total count
+ first/last seen, nothing else.
2026-04-22 22:23:27 -04:00
c50448995b feat(smtp): capture full messages + attachments to disk
SMTP template now writes each accepted DATA body as a .eml file into a
bind-mounted per-decky quarantine dir and emits a `message_stored` log
with sha256, size, decoded headers, and an attachment manifest
(filename + sha256 + size + content-type). Attachment hashing uses the
*decoded* payload so operators can match against VT / MalwareBazaar
directly. Body accumulator is capped at SMTP_MAX_BODY_BYTES (default
10 MB, matching the EHLO SIZE advert) so a streaming client can't OOM
the container.

The existing /api/v1/artifacts/{decky}/{stored_as} endpoint now takes
an optional ?service= query param (defaults to ssh for back-compat)
and can serve .eml files out of the smtp subdir. Forensic metadata
rides the normal log pipeline, same as SSH file_captured.
2026-04-22 22:17:50 -04:00
d47a84c90b refactor(models): split models.py into topical submodules
decnet/web/db/models.py was approaching 1000 lines across User/Log/
Attacker/Swarm/Topology/Workers/Updater/Health domains. Split into a
package with one module per domain; __init__.py re-exports every symbol
so all 52 call sites keep importing from decnet.web.db.models
unchanged.
2026-04-22 21:55:41 -04:00
119b4e8724 feat(db): add session_profile table for keystroke-dynamics fingerprints
New purpose-built table with schema_version column committed from day one
so V2 federation gossip can cluster sessions across operators without
retrofitting. Ships with the empty write path (upsert_session_profile);
ingestion of keystroke features (IKI moments, control-char rates, digraph
SimHash) is tracked as V2 work.

Closes gap #2 from SIGNAL_CAPTURE_AUDIT.md.
2026-04-22 21:39:17 -04:00
d3321324eb feat(sniffer): capture SSH client banner from TCP stream
Parse RFC 4253 §4.2 identification strings from the first attacker→decky
data segment on TCP/22; emit ssh_client_banner syslog events and bus
fan-out. Profiler's sniffer_rollup dedupes observed banners into a new
AttackerBehavior.ssh_client_banners JSON column.

Closes gap #3 from SIGNAL_CAPTURE_AUDIT.md.
2026-04-22 21:37:01 -04:00
8181f39ae2 feat(profiler): persist raw SSH KEX algorithm ordering
Prober already emits kex_algorithms in hassh_fingerprint syslog events, but
the raw ordered list was only queryable via the generic bounty store. Add a
dedicated AttackerBehavior.kex_order_raw column (TEXT, JSON list) so
post-v1 KEX-order fingerprinting has a typed, indexable home.

Pipeline:
  - sniffer_rollup() now consumes hassh_fingerprint events and collects
    distinct kex_algorithms strings across ports.
  - build_behavior_record() JSON-encodes the list (NULL when empty).
  - sqlmodel_repo._deserialize_behavior() parses it back into a list.

Closes pre-v1 gap #1 from SIGNAL_CAPTURE_AUDIT.md.
2026-04-22 21:29:46 -04:00
25838eb9f3 refactor(profiler): split behavioral.py into topical modules
Break the 603-line behavioral.py into timing/classify/tools/phases/fingerprint
sibling modules plus a slim orchestrator. Public API unchanged: behavioral.py
re-exports every previously-exposed symbol, so worker.py and existing tests
keep working with zero import changes.

No behavior change; all 64 profiler tests pass.
2026-04-22 21:10:19 -04:00
b51095cec5 style(web): unify button sizing across pages (padding/font/spacing) 2026-04-22 18:35:40 -04:00
4bf671b316 style(web/topologies): unify header buttons with shared outlined style 2026-04-22 18:32:43 -04:00
9d64d8a046 style(web/mazenet): tint palette chip with user accent color for contrast 2026-04-22 18:29:19 -04:00
c804d3111a style(web/mazenet): enlarge palette port/proto chip for legibility 2026-04-22 18:26:29 -04:00
602a0e1efc feat(web/mazenet): add Mail, Comms, Observability, Containers groups + remaining services 2026-04-22 18:23:24 -04:00
9c38a3f11a feat(web/mazenet): group Service Fleet items by category (Remote Access, Web, Databases, etc.) 2026-04-22 18:19:21 -04:00
1674316788 feat(web/mazenet): glide transitions for service fleet + inspector panels 2026-04-22 18:16:17 -04:00
e0231bf990 style(web/mazenet): rename PALETTE toggle to SERVICE FLEET 2026-04-22 18:14:17 -04:00
e35358afd1 feat(web/mazenet): fullscreen button also triggers browser fullscreen API 2026-04-22 18:12:38 -04:00
ef34df4a7d feat(web/mazenet): fullscreen canvas mode (hides topbar + sidebar, Esc to exit) 2026-04-22 18:11:37 -04:00
31d02a9726 feat(web/mazenet): toggleable palette (deployer) panel 2026-04-22 18:10:18 -04:00
8985c28fab fix(web/mazenet): stop canvas from overflowing viewport (flex-size shell instead of fixed calc) 2026-04-22 18:08:44 -04:00
f3e366a2a3 fix(web/topologies): stop page from overflowing viewport (min-height off by topbar+padding) 2026-04-22 18:07:09 -04:00