DECNET

Author	SHA1	Message	Date
anti	f1e14280c0	perf: 1s TTL cache for /health DB probe and /config state reads Locust hit /health and /config on every @task(3), so each request was firing repo.get_total_logs() and two repo.get_state() calls against aiosqlite — filling the driver queue for data that changes on the order of seconds, not milliseconds. Both caches follow the shape already used by the existing Docker cache: - asyncio.Lock with double-checked TTL so concurrent callers collapse into one DB hit per 1s window. - _reset_* helpers called from tests/api/conftest.py::setup_db so the module-level cache can't leak across tests. tests/test_health_config_cache.py asserts 50 concurrent callers produce exactly 1 repo call, and the cache expires after TTL.	2026-04-17 15:05:18 -04:00
anti	931f33fb06	perf: cache Docker daemon ping in /health (5s TTL) Creating a new docker.from_env() client per /health request opened a fresh unix-socket connection each time. Under load that's wasteful and hammers dockerd. Keep a module-level client + last-check timestamp; actually ping every 5 seconds, return cached state in between. Reset helper provided for tests.	2026-04-17 15:01:53 -04:00
anti	467511e997	db: switch MySQL driver to asyncmy, env-tune pool, serialize DDL - aiomysql → asyncmy on both sides of the URL/import (faster, maintained). - Pool sizing now reads DECNET_DB_POOL_SIZE / MAX_OVERFLOW / RECYCLE / PRE_PING for both SQLite and MySQL engines so stress runs can bump without code edits. - MySQL initialize() now wraps schema DDL in a GET_LOCK advisory lock so concurrent uvicorn workers racing create_all() don't hit 'Table was skipped since its definition is being modified by concurrent DDL'. - sqlite & mysql repo get_log_histogram use the shared _session() helper instead of session_factory() for consistency with the rest of the repo. - SSE stream_events docstring updated to asyncmy.	2026-04-17 15:01:49 -04:00
anti	3945e72e11	perf: run bcrypt on a thread so it doesn't block the event loop verify_password / get_password_hash are CPU-bound and take ~250ms each at rounds=12. Called directly from async endpoints, they stall every other coroutine for that window — the single biggest single-worker bottleneck on the login path. Adds averify_password / ahash_password that wrap the sync versions in asyncio.to_thread. Sync versions stay put because _ensure_admin_user and tests still use them. 5 call sites updated: login, change-password, create-user, reset-password. tests/test_auth_async.py asserts parallel averify runs concurrently (~1x of a single verify, not 2x).	2026-04-17 14:52:22 -04:00
anti	bd406090a7	fix: re-seed admin password when still unfinalized (must_change_password=True) _ensure_admin_user was strict insert-if-missing: once a stale hash landed in decnet.db (e.g. from a deploy that used a different DECNET_ADMIN_PASSWORD), login silently 401'd because changing the env var later had no effect. Now on startup: if the admin still has must_change_password=True (they never finalized their own password), re-sync the hash from the current env var. Once the admin sets a real password, we leave it alone. Found via locustfile.py login storm — see tests/test_admin_seed.py. Note: this commit also bundles uncommitted pool-management work already present in sqlmodel_repo.py from prior sessions.	2026-04-17 14:49:13 -04:00
anti	e22d057e68	added: scripts/profile/aggregate_requests.py — roll up pyinstrument request profiles Parses every HTML in profiles/, reattributes [self]/[await] synthetic leaves to their parent function, and reports per-endpoint wall-time (mean/p50/p95/max) plus top hot functions by cumulative self-time. Makes post-locust profile dirs actually readable — otherwise they're just a pile of hundred-plus HTML files.	2026-04-17 14:48:59 -04:00
anti	cb12e7c475	fix: logging handler must not crash its caller on reopen failure When decnet.system.log is root-owned (e.g. created by a pre-fix 'sudo decnet deploy') and a subsequent non-root process tries to log, the InodeAwareRotatingFileHandler raised PermissionError out of emit(), which propagated up through logger.debug/info and killed the collector's log stream loop ('log stream ended ... reason=[Errno 13]'). Now matches stdlib behaviour: wrap _open() in try/except OSError and defer to handleError() on failure. Adds a regression test. Also: scripts/profile/view.sh 'pyinstrument' keyword was matching memray-flamegraph-.html files. Exclude the memray- prefix.	2026-04-17 14:01:36 -04:00
anti	c29ca977fd	added: scripts/profile/classify_usage.py — classify memray usage_over_time.csv Reads the memray usage CSV and emits a verdict based on tail-drop-from- peak: CLIMB-AND-DROP, MOSTLY-RELEASED, or SUSTAINED-AT-PEAK. Deliberately ignores net-growth-vs-baseline since any active workload grows vs. a cold interpreter — that metric is misleading as a leak signal.	2026-04-17 13:54:37 -04:00
anti	bf4afac70f	fix: RotatingFileHandler reopens on external deletion/rotation Mirrors the inode-check fix from `935a9a5` (collector worker) for the stdlib-handler-based log paths. Both decnet.system.log (config.py) and decnet.log (logging/file_handler.py) now use a subclass that stats the target path before each emit and reopens on inode/device mismatch — matching the behavior of stdlib WatchedFileHandler while preserving size-based rotation. Previously: rm decnet.system.log → handler kept writing to the orphaned inode until maxBytes triggered; all lines between were lost.	2026-04-17 13:42:15 -04:00
anti	4b15b7eb35	fix: chown log files to sudo-invoking user so non-root API can append 'sudo decnet deploy' needs root for MACVLAN, but the log files it creates (decnet.log and decnet.system.log) end up owned by root. A subsequent non-root 'decnet api' then crashes on PermissionError appending to them. New decnet.privdrop helper reads SUDO_UID/SUDO_GID and chowns files/dirs back to the invoking user. Best-effort: no-op when not root, not under sudo, path missing, or chown fails. Applied at both log-file creation sites (config.py system log, logging/file_handler.py syslog file).	2026-04-17 13:39:09 -04:00
anti	140d2fbaad	fix: gate embedded sniffer behind DECNET_EMBED_SNIFFER (default off) The API's lifespan unconditionally spawned a MACVLAN sniffer task, which duplicated the standalone 'decnet sniffer --daemon' process that 'decnet deploy' always starts — causing two workers to sniff the same interface, double events, and wasted CPU. Mirror the existing DECNET_EMBED_PROFILER pattern: sniffer is OFF by default, opt in explicitly. Static regression tests guard against accidental removal of the gate.	2026-04-17 13:35:43 -04:00
anti	064c8760b6	fix: memray run needs --trace-python-allocators for frame attribution Without it, 'Total number of frames seen: 0' in memray stats and flamegraphs render empty / C-only. Also added --follow-fork so uvicorn workers spawned as child processes are tracked.	2026-04-17 13:24:55 -04:00
anti	6572c5cbaf	added: scripts/profile/view.sh — auto-pick newest artifact and open viewer Dispatches by extension: .prof -> snakeviz, memray .bin -> memray flamegraph (overridable via VIEW=table\|tree\|stats\|summary\|leaks), .svg/.html -> xdg-open. Positional arg can be a file path or a type keyword (cprofile, memray, pyspy, pyinstrument).	2026-04-17 13:20:05 -04:00
anti	ba448bae13	docs: py-spy 0.4.1 lacks Python 3.14 support; wrapper aborts early Root cause of 'No python processes found in process <pid>': py-spy needs per-release ABI knowledge and 0.4.1 (latest PyPI) predates 3.14. Wrapper now detects the interpreter and points users at pyinstrument/memray/cProfile.	2026-04-17 13:17:23 -04:00
anti	1a18377b0a	fix: mysql url builder tests expect asyncmy, not aiomysql The builder in decnet/web/db/mysql/database.py emits 'mysql+asyncmy://' URLs (asyncmy is the declared dep in pyproject.toml). Tests were stale from a prior aiomysql era.	2026-04-17 13:13:36 -04:00
anti	319c1dbb61	added: profiling toolchain (py-spy, pyinstrument, pytest-benchmark, memray, snakeviz) New `profile` optional-deps group, opt-in Pyinstrument ASGI middleware gated by DECNET_PROFILE_REQUESTS, bench marker + tests/perf/ micro-benchmarks for repository hot paths, and scripts/profile/ helpers for py-spy/cProfile/memray.	2026-04-17 13:13:00 -04:00
anti	c1d8102253	modified: DEVELOPMENT roadmap. one step closer to v1	2026-04-16 11:39:07 -04:00
anti	49f3002c94	added: docs; modified: .gitignore Some checks failed CI / Lint (ruff) (push) Successful in 18s Details CI / SAST (bandit) (push) Successful in 19s Details CI / Dependency audit (pip-audit) (push) Successful in 40s Details CI / Test (Standard) (3.11) (push) Successful in 2m38s Details CI / Test (Standard) (3.12) (push) Successful in 2m56s Details CI / Test (Live) (3.11) (push) Failing after 1m3s Details CI / Test (Fuzz) (3.11) (push) Has been skipped Details CI / Merge dev → testing (push) Has been skipped Details CI / Prepare Merge to Main (push) Has been skipped Details CI / Finalize Merge to Main (push) Has been skipped Details	2026-04-16 02:10:38 -04:00
anti	9b59f8672e	chores: cleanup; added: viteconfig	2026-04-16 02:09:30 -04:00
anti	296979003d	fix: pytest -m live works without extra flags Root cause: test_schemathesis.py mutates decnet.web.auth.SECRET_KEY at module-level import time, poisoning JWT verification for all other tests in the same process — even when fuzz tests are deselected. - Add pytest_ignore_collect hook in tests/api/conftest.py to skip collecting test_schemathesis.py unless -m fuzz is selected - Add --dist loadscope to addopts so xdist groups by module (protects module-scoped fixtures in live tests) - Remove now-unnecessary xdist_group markers from live test classes	2026-04-16 01:55:38 -04:00
anti	89099b903d	fix: resolve schemathesis and live test failures - Add 403 response to all RBAC-gated endpoints (schemathesis UndefinedStatusCode) - Add 400 response to all endpoints accepting JSON bodies (malformed input) - Add required 'title' field to schemathesis.toml for schemathesis 4.15+ - Add xdist_group markers to live tests with module-scoped fixtures to prevent xdist from distributing them across workers (fixture isolation)	2026-04-16 01:39:04 -04:00
anti	29578d9d99	fix: resolve all ruff and bandit lint/security issues - Remove unused Optional import (F401) in telemetry.py - Move imports above module-level code (E402) in web/db/models.py - Default API/web hosts to 127.0.0.1 instead of 0.0.0.0 (B104) - Add usedforsecurity=False to MD5 calls in JA3/HASSH fingerprinting (B324) - Annotate intentional try/except/pass blocks with nosec (B110) - Remove stale nosec comments that no longer suppress anything	2026-04-16 01:04:57 -04:00
anti	70d8ffc607	feat: complete OTEL tracing across all services with pipeline bridge and docs Extends tracing to every remaining module: all 23 API route handlers, correlation engine, sniffer (fingerprint/p0f/syslog), prober (jarm/hassh/tcpfp), profiler behavioral analysis, logging subsystem, engine, and mutator. Bridges the ingester→SSE trace gap by persisting trace_id/span_id columns on the logs table and creating OTEL span links in the SSE endpoint. Adds log-trace correlation via _TraceContextFilter injecting otel_trace_id into Python LogRecords. Includes development/docs/TRACING.md with full span reference (76 spans), pipeline propagation architecture, quick start guide, and troubleshooting.	2026-04-16 00:58:08 -04:00
anti	04db13afae	feat: cross-stage trace propagation and granular per-event spans Collector now creates a span per event and injects W3C trace context into JSON records. Ingester extracts that context and creates child spans, connecting the full event journey: collector -> ingester -> db.add_log + extract_bounty -> db.add_bounty. Profiler now creates per-IP spans inside update_profiles with rich attributes (event_count, is_traversal, bounty_count, command_count). Traces in Jaeger now show the complete execution map from capture through ingestion and profiling.	2026-04-15 23:52:13 -04:00
anti	d1a88e75bd	fix: dynamic TracedRepository proxy + disable tracing in test suite Replace brittle explicit method-by-method proxy with __getattr__-based dynamic proxy that forwards all args/kwargs to the inner repo. Fixes TypeError on get_logs_after_id() where concrete repo accepts extra kwargs beyond the ABC signature. Pin DECNET_DEVELOPER_TRACING=false in conftest.py so .env.local settings don't leak into the test suite.	2026-04-15 23:46:46 -04:00
anti	65ddb0b359	feat: add OpenTelemetry distributed tracing across all DECNET services Gated by DECNET_DEVELOPER_TRACING env var (default off, zero overhead). When enabled, traces flow through FastAPI routes, background workers (collector, ingester, profiler, sniffer, prober), engine/mutator operations, and all DB calls via TracedRepository proxy. Includes Jaeger docker-compose for local dev and 18 unit tests.	2026-04-15 23:23:13 -04:00
anti	b437bc8eec	fix: use unbuffered reads in proxy for SSE streaming resp.read(4096) blocks until 4096 bytes accumulate, which stalls SSE events (~100-500 bytes each) in the proxy buffer indefinitely. Switch to read1() which returns bytes immediately available without waiting for more. Also disable the 120s socket timeout for SSE connections.	2026-04-15 23:03:03 -04:00
anti	a1ca5d699b	fix: use dedicated thread pools for collector and sniffer workers The collector spawned one permanent thread per Docker container via asyncio.to_thread(), saturating the default asyncio executor. This starved short-lived to_thread(load_state) calls in get_deckies() and get_stats_summary(), causing the SSE stream and deckies endpoints to hang indefinitely while other DB-only endpoints worked fine. Give the collector and sniffer their own ThreadPoolExecutor so they never compete with the default pool.	2026-04-15 22:57:03 -04:00
anti	e9d151734d	feat: deduplicate bounties on (bounty_type, attacker_ip, payload) Before inserting a bounty, check whether an identical row already exists. Drops silent duplicates to prevent DB saturation from aggressive scanners.	2026-04-15 18:02:52 -04:00
anti	0ab97d0ade	docs: document decnet domain models and fleet transformation	2026-04-15 18:01:27 -04:00
anti	60de16be84	docs: document decnet collector worker	2026-04-15 17:56:24 -04:00
anti	82ec7f3117	fix: gate embedded profiler behind DECNET_EMBED_PROFILER to prevent dual-instance cursor conflict decnet deploy spawns a standalone profiler daemon AND api.py was also starting attacker_profile_worker as an asyncio task inside the web server. Both instances shared the same attacker_worker_cursor key in the state table, causing a race where one instance could skip events already claimed by the other or overwrite the cursor mid-batch. Default is now OFF (embedded profiler disabled). The standalone daemon started by decnet deploy is the single authoritative instance. Set DECNET_EMBED_PROFILER=true only when running decnet api in isolation without a full deploy.	2026-04-15 17:49:18 -04:00
anti	11d749f13d	fix: wire prober tcpfp_fingerprint events into sniffer_rollup for OS/hop detection The active prober emits tcpfp_fingerprint events with TTL, window, MSS etc. from the attacker's SYN-ACK. These were invisible to the behavioral profiler for two reasons: 1. target_ip (prober's field name for attacker IP) was not in _IP_FIELDS in collector/worker.py or correlation/parser.py, so the profiler re-parsed raw_lines and got attacker_ip=None, never attributing prober events to the attacker profile. 2. sniffer_rollup only handled tcp_syn_fingerprint (passive sniffer) and ignored tcpfp_fingerprint (active prober). Prober events use different field names: window_size/window_scale/sack_ok vs window/wscale/has_sack. Changes: - Add target_ip to _IP_FIELDS in collector and parser - Add _PROBER_TCPFP_EVENT and _INITIAL_TTL table to behavioral.py - sniffer_rollup now processes tcpfp_fingerprint: maps field names, derives OS from TTL via _os_from_ttl, computes hop_distance = initial_ttl - observed - Expand prober DEFAULT_TCPFP_PORTS to [22,80,443,8080,8443,445,3389] for better SYN-ACK coverage on attacker machines - Add 4 tests covering prober OS detection, hop distance, and field mapping	2026-04-15 17:36:40 -04:00
anti	a4798946c1	fix: add remote_addr to IP field lookup so http/https/k8s events are attributed correctly Templates for http, https, k8s, and docker_api log the client IP as remote_addr (Flask's request.remote_addr) instead of src_ip. The collector and correlation parser only checked src_ip/src/client_ip/remote_ip/ip, so every request event from those services was stored with attacker_ip="Unknown" and never associated with any attacker profile. Adding remote_addr to _IP_FIELDS in both collector/worker.py and correlation/parser.py fixes attribution. The profiler cursor was also reset to 0 so the worker performs a cold rebuild and re-ingests existing events with the corrected field mapping.	2026-04-15 17:23:33 -04:00
anti	d869eb3d23	docs: document decnet engine orchestrator	2026-04-15 17:13:13 -04:00
anti	89887ec6fd	fix: serialize HTTP headers as JSON so tool detection and bounty extraction work templates/decnet_logging.py calls str(v) on all SD-PARAM values, turning a headers dict into Python repr ('{'User-Agent': ...}') rather than JSON. detect_tools_from_headers() called json.loads() on that string and silently swallowed the error, returning [] for every HTTP event. Same bug prevented the ingester from extracting User-Agent bounty fingerprints. - templates/http/server.py: wrap headers dict in json.dumps() before passing to syslog_line so the value is a valid JSON string in the syslog record - behavioral.py: add ast.literal_eval fallback for existing DB rows that were stored with the old Python repr format - ingester.py: parse headers as JSON string in _extract_bounty so User-Agent fingerprints are stored correctly going forward - tests: add test_json_string_headers and test_python_repr_headers_fallback to exercise both formats in detect_tools_from_headers	2026-04-15 17:03:52 -04:00
anti	02e73a19d5	fix: promote TCP-fingerprinted nmap to tool_guesses (detects -sC sans HTTP)	2026-04-15 16:44:45 -04:00
anti	b3efd646f6	feat: replace tool attribution stat with dedicated DETECTED TOOLS block	2026-04-15 16:37:54 -04:00
anti	2ec64ef2ef	fix: rename BEHAVIOR label to ATTACK PATTERN for clarity	2026-04-15 16:36:19 -04:00
anti	e67624452e	feat: centralize microservice logging to DECNET_SYSTEM_LOGS (default: decnet.system.log)	2026-04-15 16:23:28 -04:00
anti	e05b632e56	feat: update AttackerDetail UI for new behavior classes and multi-tool badges	2026-04-15 15:49:03 -04:00
anti	c8f05df4d9	feat: overhaul behavioral profiler — multi-tool detection, improved classification, TTL OS fallback	2026-04-15 15:47:02 -04:00
anti	935a9a58d2	fix: reopen collector log handles after deletion or log rotation Replaces the single persistent open() with inode-based reopen logic. If decnet.log or decnet.json is deleted or renamed by logrotate, the next write detects the stale inode, closes the old handle, and creates a fresh file — preventing silent data loss to orphaned inodes.	2026-04-15 14:04:54 -04:00
anti	63efe6c7ba	fix: persist ingester position and profiler cursor across restarts - Ingester now loads byte-offset from DB on startup (key: ingest_worker_position) and saves it after each batch — prevents full re-read on every API restart - On file truncation/rotation the saved offset is reset to 0 - Profiler worker now loads last_log_id from DB on startup — every restart becomes an incremental update instead of a full cold rebuild - Updated all affected tests to mock get_state/set_state; added new tests covering position restore, set_state call, truncation reset, and cursor restore/cold-start paths	2026-04-15 13:58:12 -04:00
anti	314e6c6388	fix: remove event-loop-blocking cold start; unify profiler to cursor-based incremental Cold start fetched all logs in one bulk query then processed them in a tight synchronous loop with no yields, blocking the asyncio event loop for seconds on datasets of 30K+ rows. This stalled every concurrent await — including the SSE stream generator's initial DB calls — causing the dashboard to show INITIALIZING SENSORS indefinitely. Changes: - Drop _cold_start() and get_all_logs_raw(); uninitialized state now runs the same cursor loop as incremental, starting from last_log_id=0 - Yield to the event loop after every _BATCH_SIZE rows (asyncio.sleep(0)) - Add SSE keepalive comment as first yield so the connection flushes before any DB work begins - Add Cache-Control/X-Accel-Buffering headers to StreamingResponse	2026-04-15 13:46:42 -04:00
anti	12aa98a83c	fix: migrate TEXT→MEDIUMTEXT for attacker/state columns on MySQL Existing MySQL databases hit a DataError when the commands/fingerprints JSON blobs exceed 64 KiB (TEXT limit). _BIG_TEXT emits MEDIUMTEXT only at CREATE TABLE time; create_all() is a no-op on existing columns. Add MySQLRepository._migrate_column_types() that queries information_schema and issues ALTER TABLE … MODIFY COLUMN … MEDIUMTEXT for the five affected columns (commands, fingerprints, services, deckies, state.value) whenever they are still TEXT. Called from an overridden initialize() after _migrate_attackers_table() and before create_all(). Add tests/test_mysql_migration.py covering: ALTER issued for TEXT columns, no-op for already-MEDIUMTEXT, idempotency, DEFAULT clause correctness, and initialize() call order.	2026-04-15 12:59:54 -04:00
anti	7dbc71d664	test: add profiler behavioral analysis and RBAC endpoint tests - test_profiler_behavioral.py: attacker behavior pattern matching tests - api/test_rbac.py: comprehensive RBAC role separation tests - api/config/: configuration API endpoint tests (CRUD, reinit, user management)	2026-04-15 12:51:38 -04:00
anti	dae3687089	test: add fingerprinting and TCP analysis tests - test_sniffer_p0f.py: p0f passive OS fingerprinting tests - test_sniffer_tcp_fingerprint.py: TCP fingerprinting accuracy tests - test_sniffer_retransmit.py: retransmission detection and analysis	2026-04-15 12:51:35 -04:00
anti	187194786f	test: add MySQL backend integration tests - test_mysql_backend_live.py: live integration tests for MySQL connections - test_mysql_histogram_sql.py: dialect-specific histogram query tests - test_mysql_url_builder.py: MySQL connection string construction - mysql_spinup.sh: Docker spinup script for local MySQL testing	2026-04-15 12:51:33 -04:00
anti	9de320421e	test: add repository factory and CLI db-reset tests - test_factory.py: verify database factory selects correct backend - test_cli_db_reset.py: test CLI database reset functionality	2026-04-15 12:51:29 -04:00

1 2 3 4 5 ...

389 Commits