verify_password / get_password_hash are CPU-bound and take ~250ms each
at rounds=12. Called directly from async endpoints, they stall every
other coroutine for that window — the single biggest single-worker
bottleneck on the login path.
Adds averify_password / ahash_password that wrap the sync versions in
asyncio.to_thread. Sync versions stay put because _ensure_admin_user and
tests still use them.
5 call sites updated: login, change-password, create-user, reset-password.
tests/test_auth_async.py asserts parallel averify runs concurrently (~1x
of a single verify, not 2x).
_ensure_admin_user was strict insert-if-missing: once a stale hash landed
in decnet.db (e.g. from a deploy that used a different DECNET_ADMIN_PASSWORD),
login silently 401'd because changing the env var later had no effect.
Now on startup: if the admin still has must_change_password=True (they
never finalized their own password), re-sync the hash from the current
env var. Once the admin sets a real password, we leave it alone.
Found via locustfile.py login storm — see tests/test_admin_seed.py.
Note: this commit also bundles uncommitted pool-management work already
present in sqlmodel_repo.py from prior sessions.
When decnet.system.log is root-owned (e.g. created by a pre-fix 'sudo
decnet deploy') and a subsequent non-root process tries to log, the
InodeAwareRotatingFileHandler raised PermissionError out of emit(),
which propagated up through logger.debug/info and killed the collector's
log stream loop ('log stream ended ... reason=[Errno 13]').
Now matches stdlib behaviour: wrap _open() in try/except OSError and
defer to handleError() on failure. Adds a regression test.
Also: scripts/profile/view.sh 'pyinstrument' keyword was matching
memray-flamegraph-*.html files. Exclude the memray-* prefix.
Mirrors the inode-check fix from 935a9a5 (collector worker) for the
stdlib-handler-based log paths. Both decnet.system.log (config.py) and
decnet.log (logging/file_handler.py) now use a subclass that stats the
target path before each emit and reopens on inode/device mismatch —
matching the behavior of stdlib WatchedFileHandler while preserving
size-based rotation.
Previously: rm decnet.system.log → handler kept writing to the orphaned
inode until maxBytes triggered; all lines between were lost.
'sudo decnet deploy' needs root for MACVLAN, but the log files it creates
(decnet.log and decnet.system.log) end up owned by root. A subsequent
non-root 'decnet api' then crashes on PermissionError appending to them.
New decnet.privdrop helper reads SUDO_UID/SUDO_GID and chowns files/dirs
back to the invoking user. Best-effort: no-op when not root, not under
sudo, path missing, or chown fails. Applied at both log-file creation
sites (config.py system log, logging/file_handler.py syslog file).
The API's lifespan unconditionally spawned a MACVLAN sniffer task, which
duplicated the standalone 'decnet sniffer --daemon' process that
'decnet deploy' always starts — causing two workers to sniff the same
interface, double events, and wasted CPU.
Mirror the existing DECNET_EMBED_PROFILER pattern: sniffer is OFF by
default, opt in explicitly. Static regression tests guard against
accidental removal of the gate.
Dispatches by extension: .prof -> snakeviz, memray .bin -> memray flamegraph
(overridable via VIEW=table|tree|stats|summary|leaks), .svg/.html -> xdg-open.
Positional arg can be a file path or a type keyword (cprofile, memray, pyspy,
pyinstrument).
Root cause of 'No python processes found in process <pid>': py-spy needs
per-release ABI knowledge and 0.4.1 (latest PyPI) predates 3.14. Wrapper
now detects the interpreter and points users at pyinstrument/memray/cProfile.
The builder in decnet/web/db/mysql/database.py emits 'mysql+asyncmy://' URLs
(asyncmy is the declared dep in pyproject.toml). Tests were stale from a
prior aiomysql era.
New `profile` optional-deps group, opt-in Pyinstrument ASGI middleware
gated by DECNET_PROFILE_REQUESTS, bench marker + tests/perf/ micro-benchmarks
for repository hot paths, and scripts/profile/ helpers for py-spy/cProfile/memray.
Root cause: test_schemathesis.py mutates decnet.web.auth.SECRET_KEY at
module-level import time, poisoning JWT verification for all other tests
in the same process — even when fuzz tests are deselected.
- Add pytest_ignore_collect hook in tests/api/conftest.py to skip
collecting test_schemathesis.py unless -m fuzz is selected
- Add --dist loadscope to addopts so xdist groups by module (protects
module-scoped fixtures in live tests)
- Remove now-unnecessary xdist_group markers from live test classes
- Add 403 response to all RBAC-gated endpoints (schemathesis UndefinedStatusCode)
- Add 400 response to all endpoints accepting JSON bodies (malformed input)
- Add required 'title' field to schemathesis.toml for schemathesis 4.15+
- Add xdist_group markers to live tests with module-scoped fixtures to
prevent xdist from distributing them across workers (fixture isolation)
Replace brittle explicit method-by-method proxy with __getattr__-based
dynamic proxy that forwards all args/kwargs to the inner repo. Fixes
TypeError on get_logs_after_id() where concrete repo accepts extra
kwargs beyond the ABC signature.
Pin DECNET_DEVELOPER_TRACING=false in conftest.py so .env.local
settings don't leak into the test suite.
Gated by DECNET_DEVELOPER_TRACING env var (default off, zero overhead).
When enabled, traces flow through FastAPI routes, background workers
(collector, ingester, profiler, sniffer, prober), engine/mutator
operations, and all DB calls via TracedRepository proxy.
Includes Jaeger docker-compose for local dev and 18 unit tests.
The collector spawned one permanent thread per Docker container via
asyncio.to_thread(), saturating the default asyncio executor. This
starved short-lived to_thread(load_state) calls in get_deckies() and
get_stats_summary(), causing the SSE stream and deckies endpoints to
hang indefinitely while other DB-only endpoints worked fine.
Give the collector and sniffer their own ThreadPoolExecutor so they
never compete with the default pool.
The active prober emits tcpfp_fingerprint events with TTL, window, MSS etc.
from the attacker's SYN-ACK. These were invisible to the behavioral profiler
for two reasons:
1. target_ip (prober's field name for attacker IP) was not in _IP_FIELDS in
collector/worker.py or correlation/parser.py, so the profiler re-parsed
raw_lines and got attacker_ip=None, never attributing prober events to
the attacker profile.
2. sniffer_rollup only handled tcp_syn_fingerprint (passive sniffer) and
ignored tcpfp_fingerprint (active prober). Prober events use different
field names: window_size/window_scale/sack_ok vs window/wscale/has_sack.
Changes:
- Add target_ip to _IP_FIELDS in collector and parser
- Add _PROBER_TCPFP_EVENT and _INITIAL_TTL table to behavioral.py
- sniffer_rollup now processes tcpfp_fingerprint: maps field names, derives
OS from TTL via _os_from_ttl, computes hop_distance = initial_ttl - observed
- Expand prober DEFAULT_TCPFP_PORTS to [22,80,443,8080,8443,445,3389] for
better SYN-ACK coverage on attacker machines
- Add 4 tests covering prober OS detection, hop distance, and field mapping
templates/decnet_logging.py calls str(v) on all SD-PARAM values, turning a
headers dict into Python repr ('{'User-Agent': ...}') rather than JSON.
detect_tools_from_headers() called json.loads() on that string and silently
swallowed the error, returning [] for every HTTP event. Same bug prevented
the ingester from extracting User-Agent bounty fingerprints.
- templates/http/server.py: wrap headers dict in json.dumps() before passing
to syslog_line so the value is a valid JSON string in the syslog record
- behavioral.py: add ast.literal_eval fallback for existing DB rows that were
stored with the old Python repr format
- ingester.py: parse headers as JSON string in _extract_bounty so User-Agent
fingerprints are stored correctly going forward
- tests: add test_json_string_headers and test_python_repr_headers_fallback
to exercise both formats in detect_tools_from_headers
Replaces the single persistent open() with inode-based reopen logic.
If decnet.log or decnet.json is deleted or renamed by logrotate, the
next write detects the stale inode, closes the old handle, and creates
a fresh file — preventing silent data loss to orphaned inodes.
- Ingester now loads byte-offset from DB on startup (key: ingest_worker_position)
and saves it after each batch — prevents full re-read on every API restart
- On file truncation/rotation the saved offset is reset to 0
- Profiler worker now loads last_log_id from DB on startup — every restart
becomes an incremental update instead of a full cold rebuild
- Updated all affected tests to mock get_state/set_state; added new tests
covering position restore, set_state call, truncation reset, and cursor
restore/cold-start paths
Cold start fetched all logs in one bulk query then processed them in a tight
synchronous loop with no yields, blocking the asyncio event loop for seconds
on datasets of 30K+ rows. This stalled every concurrent await — including the
SSE stream generator's initial DB calls — causing the dashboard to show
INITIALIZING SENSORS indefinitely.
Changes:
- Drop _cold_start() and get_all_logs_raw(); uninitialized state now runs the
same cursor loop as incremental, starting from last_log_id=0
- Yield to the event loop after every _BATCH_SIZE rows (asyncio.sleep(0))
- Add SSE keepalive comment as first yield so the connection flushes before
any DB work begins
- Add Cache-Control/X-Accel-Buffering headers to StreamingResponse
Existing MySQL databases hit a DataError when the commands/fingerprints
JSON blobs exceed 64 KiB (TEXT limit). _BIG_TEXT emits MEDIUMTEXT only
at CREATE TABLE time; create_all() is a no-op on existing columns.
Add MySQLRepository._migrate_column_types() that queries
information_schema and issues ALTER TABLE … MODIFY COLUMN … MEDIUMTEXT
for the five affected columns (commands, fingerprints, services, deckies,
state.value) whenever they are still TEXT. Called from an overridden
initialize() after _migrate_attackers_table() and before create_all().
Add tests/test_mysql_migration.py covering: ALTER issued for TEXT columns,
no-op for already-MEDIUMTEXT, idempotency, DEFAULT clause correctness,
and initialize() call order.
- test_mysql_backend_live.py: live integration tests for MySQL connections
- test_mysql_histogram_sql.py: dialect-specific histogram query tests
- test_mysql_url_builder.py: MySQL connection string construction
- mysql_spinup.sh: Docker spinup script for local MySQL testing
Connection-lifecycle events (connect, disconnect, accept, close) fire once
per TCP connection. During a portscan or credential-stuffing run this
firehoses the SQLite ingester with tiny WAL writes and starves all reads
until the queue drains.
The collector now deduplicates these events by
(attacker_ip, decky, service, event_type) over a 1-second window before
writing to the .json ingestion stream. The raw .log file is untouched, so
rsyslog/SIEM still see every event for forensic fidelity.
Tunable via DECNET_COLLECTOR_RL_WINDOW_SEC and DECNET_COLLECTOR_RL_EVENT_TYPES.
The live test modules set DECNET_CONTRACT_TEST=true at module level,
which persisted across xdist workers and caused the mutate endpoint
to short-circuit before the mock was reached. Clear the env var in
affected tests with monkeypatch.delenv.
21 live tests covering all background workers against real resources:
collector (real Docker daemon), ingester (real filesystem + DB),
attacker worker (real DB profiles), sniffer (real network interfaces),
API lifespan (real health endpoint), and cross-service cascade isolation.
9 tests covering auth enforcement, component reporting, status
transitions, degraded mode, and real DB/Docker state validation.
Runs with -m live alongside other live service tests.
23 tests verifying that each background worker degrades gracefully
when its dependencies are unavailable, and that failures don't cascade:
- Collector: Docker unavailable, no state file, empty fleet
- Ingester: missing log file, unset env var, malformed JSON, fatal DB
- Attacker: DB errors, empty database
- Sniffer: missing interface, no state, scapy crash, non-decky traffic
- API lifespan: all workers failing, DB init failure, sniffer import fail
- Cascade: collector→ingester, ingester→attacker, sniffer→collector, DB→sniffer
Replace per-decky sniffer containers with a single host-side sniffer
that monitors all traffic on the MACVLAN interface. Runs as a background
task in the FastAPI lifespan alongside the collector, fully fault-isolated
so failures never crash the API.
- Add fleet_singleton flag to BaseService; sniffer marked as singleton
- Composer skips fleet_singleton services in compose generation
- Fleet builder excludes singletons from random service assignment
- Extract TLS fingerprinting engine from templates/sniffer/server.py
into decnet/sniffer/ package (parameterized for fleet-wide use)
- Sniffer worker maps packets to deckies via IP→name state mapping
- Original templates/sniffer/server.py preserved for future use
Extends the prober with two new active probe types alongside JARM:
- HASSHServer: SSH server fingerprinting via KEX_INIT algorithm ordering
(MD5 hash of kex;enc_s2c;mac_s2c;comp_s2c, pure stdlib)
- TCP/IP stack: OS/tool fingerprinting via SYN-ACK analysis using scapy
(TTL, window size, DF bit, MSS, TCP options ordering, SHA256 hash)
Worker probe cycle now runs three phases per IP with independent
per-type port tracking. Ingester extracts bounties for all three
fingerprint types.
Reverts commits 8c249f6, a6c7cfd, 7ff5703. The SSH log relay approach
requires container redeployment and doesn't retroactively fix existing
attacker profiles. Rolling back to reassess the approach.
New log_relay.py replaces raw 'cat' on the rsyslog pipe. Intercepts
sshd and bash lines and re-emits them as structured RFC 5424 events:
login_success, session_opened, disconnect, connection_closed, command.
Parsers updated to accept non-nil PROCID (sshd uses PID).
The SSH honeypot logs commands via PROMPT_COMMAND logger as:
<14>1 ... bash - - - CMD uid=0 pwd=/root cmd=ls
These lines had service=bash and event_type=-, so the attacker worker
never recognized them as commands. Both the collector and correlation
parsers now detect the CMD pattern and normalize to service=ssh,
event_type=command, with uid/pwd/command in fields.
New GET /attackers/{uuid}/commands?limit=&offset=&service= endpoint
serves commands with server-side pagination and optional service filter.
AttackerDetail frontend fetches commands from this endpoint with
page controls. Service badge filter now drives both the API query
and the local fingerprint filter.
API now accepts ?service=https to filter attackers by targeted service.
Service badges are clickable in both the attacker list and detail views,
navigating to a filtered view. Active filter shows as a dismissable tag.
TLS-wrapped variant of the HTTP honeypot. Auto-generates a self-signed
certificate on startup if none is provided. Supports all the same persona
options (fake_app, server_header, custom_body, etc.) plus TLS_CERT,
TLS_KEY, and TLS_CN configuration.
EHLO/HELO require a domain or address-literal argument. Previously
the server accepted bare EHLO with no argument and responded 250,
which deviates from the spec and makes the honeypot easier to
fingerprint.
The collector kept streaming stale container IDs after a redeploy,
causing new service logs to never reach decnet.log. Now _kill_api()
also matches and SIGTERMs any running decnet.cli collect process.
Two bugs fixed:
- data_received only split on CRLF, so clients sending bare LF (telnet, nc,
some libraries) got no responses at all. Now splits on LF and strips
trailing CR, matching real Postfix behavior.
- AUTH PLAIN without inline credentials set state to "await_plain" but no
handler existed for that state, causing the next line to be dispatched as
a normal command. Added the missing state handler.