13 Commits

Author SHA1 Message Date
32340bea0d perf: migrate hot-path JSON serialization to orjson
stdlib json was FastAPI's default. Every response body, every SSE frame,
and every add_log/state/payload write paid the stdlib encode cost.

- pyproject.toml: add orjson>=3.10 as a core dep.
- decnet/web/api.py: default_response_class=ORJSONResponse on the
  FastAPI app, so every endpoint return goes through orjson without
  touching call sites. Explicit JSONResponse sites in the validation
  exception handlers migrated to ORJSONResponse for consistency.
- health endpoint's explicit JSONResponse → ORJSONResponse.
- SSE stream (api_stream_events.py): 6 json.dumps call sites →
  orjson.dumps(...).decode() — the per-event frames that fire on every
  sse tick.
- sqlmodel_repo.py: encode sites on the log-insert path switched to
  orjson (fields, payload, state value). Parser sites (json.loads)
  left as-is for now — not on the measured hot path.
2026-04-17 15:07:28 -04:00
467511e997 db: switch MySQL driver to asyncmy, env-tune pool, serialize DDL
- aiomysql → asyncmy on both sides of the URL/import (faster, maintained).
- Pool sizing now reads DECNET_DB_POOL_SIZE / MAX_OVERFLOW / RECYCLE /
  PRE_PING for both SQLite and MySQL engines so stress runs can bump
  without code edits.
- MySQL initialize() now wraps schema DDL in a GET_LOCK advisory lock so
  concurrent uvicorn workers racing create_all() don't hit 'Table was
  skipped since its definition is being modified by concurrent DDL'.
- sqlite & mysql repo get_log_histogram use the shared _session() helper
  instead of session_factory() for consistency with the rest of the repo.
- SSE stream_events docstring updated to asyncmy.
2026-04-17 15:01:49 -04:00
89099b903d fix: resolve schemathesis and live test failures
- Add 403 response to all RBAC-gated endpoints (schemathesis UndefinedStatusCode)
- Add 400 response to all endpoints accepting JSON bodies (malformed input)
- Add required 'title' field to schemathesis.toml for schemathesis 4.15+
- Add xdist_group markers to live tests with module-scoped fixtures to
  prevent xdist from distributing them across workers (fixture isolation)
2026-04-16 01:39:04 -04:00
70d8ffc607 feat: complete OTEL tracing across all services with pipeline bridge and docs
Extends tracing to every remaining module: all 23 API route handlers,
correlation engine, sniffer (fingerprint/p0f/syslog), prober (jarm/hassh/tcpfp),
profiler behavioral analysis, logging subsystem, engine, and mutator.

Bridges the ingester→SSE trace gap by persisting trace_id/span_id columns on
the logs table and creating OTEL span links in the SSE endpoint. Adds log-trace
correlation via _TraceContextFilter injecting otel_trace_id into Python LogRecords.

Includes development/docs/TRACING.md with full span reference (76 spans),
pipeline propagation architecture, quick start guide, and troubleshooting.
2026-04-16 00:58:08 -04:00
c8f05df4d9 feat: overhaul behavioral profiler — multi-tool detection, improved classification, TTL OS fallback 2026-04-15 15:47:02 -04:00
314e6c6388 fix: remove event-loop-blocking cold start; unify profiler to cursor-based incremental
Cold start fetched all logs in one bulk query then processed them in a tight
synchronous loop with no yields, blocking the asyncio event loop for seconds
on datasets of 30K+ rows. This stalled every concurrent await — including the
SSE stream generator's initial DB calls — causing the dashboard to show
INITIALIZING SENSORS indefinitely.

Changes:
- Drop _cold_start() and get_all_logs_raw(); uninitialized state now runs the
  same cursor loop as incremental, starting from last_log_id=0
- Yield to the event loop after every _BATCH_SIZE rows (asyncio.sleep(0))
- Add SSE keepalive comment as first yield so the connection flushes before
  any DB work begins
- Add Cache-Control/X-Accel-Buffering headers to StreamingResponse
2026-04-15 13:46:42 -04:00
0ee23b8700 refactor: enforce RBAC decorators on all API endpoints
- Add @require_role() decorators to all GET/POST/PUT endpoints
- Centralize role-based access control per memory: RBAC null-role bug required server-side gating
- Admin (manage_admins), Editor (write ops), Viewer (read ops), Public endpoints
- Removes client-side role checks as per memory: server-side UI gating is mandatory
2026-04-15 12:51:05 -04:00
035499f255 feat: add component-aware RFC 5424 application logging system
- Modify Rfc5424Formatter to read decnet_component from LogRecord
  and use it as RFC 5424 APP-NAME field (falls back to 'decnet')
- Add get_logger(component) factory in decnet/logging/__init__.py
  with _ComponentFilter that injects decnet_component on each record
- Wire all five layers to their component tag:
    cli -> 'cli', engine -> 'engine', api -> 'api' (api.py, ingester,
    routers), mutator -> 'mutator', collector -> 'collector'
- Add structured INFO/DEBUG/WARNING/ERROR log calls throughout each
  layer per the defined vocabulary; DEBUG calls are suppressed unless
  DECNET_DEVELOPER=true
- Add tests/test_logging.py covering factory, filter, formatter
  component-awareness, fallback behaviour, and level gating
2026-04-13 07:39:01 -04:00
b2e4706a14 Refactor: implemented Repository Factory and Async Mutator Engine. Decoupled storage logic and enforced Dependency Injection across CLI and Web API. Updated documentation.
Some checks failed
CI / Lint (ruff) (push) Successful in 12s
CI / SAST (bandit) (push) Successful in 13s
CI / Dependency audit (pip-audit) (push) Successful in 22s
CI / Test (Standard) (3.11) (push) Failing after 54s
CI / Test (Standard) (3.12) (push) Successful in 1m35s
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-12 07:48:17 -04:00
6b8392102e fix: emit stats/histogram snapshot on SSE connect; remove polling api.get('/stats') from Dashboard 2026-04-09 19:23:24 -04:00
d2a569496d fix: add get_stream_user dependency for SSE endpoint; allow query-string token for EventSource 2026-04-09 19:20:38 -04:00
016115a523 fix: clear all addressable technical debt (DEBT-005 through DEBT-025)
Security:
- DEBT-008: remove query-string token auth; header-only Bearer now enforced
- DEBT-013: add regex constraint ^[a-z0-9\-]{1,64}$ on decky_name path param
- DEBT-015: stop leaking raw exception detail to API clients; log server-side
- DEBT-016: validate search (max_length=512) and datetime params with regex

Reliability:
- DEBT-014: wrap SSE event_generator in try/except; yield error frame on failure
- DEBT-017: emit log.warning/error on DB init retry; silent failures now visible

Observability / Docs:
- DEBT-020: add 401/422 response declarations to all route decorators

Infrastructure:
- DEBT-018: add HEALTHCHECK to all 24 template Dockerfiles
- DEBT-019: add USER decnet + setcap cap_net_bind_service to all 24 Dockerfiles
- DEBT-024: bump Redis template version 7.0.12 → 7.2.7

Config:
- DEBT-012: validate DECNET_API_PORT and DECNET_WEB_PORT range (1-65535)

Code quality:
- DEBT-010: delete 22 duplicate decnet_logging.py copies; deployer injects canonical
- DEBT-022: closed as false positive (print only in module docstring)
- DEBT-009: closed as false positive (templates already use structured syslog_line)

Build:
- DEBT-025: generate requirements.lock via pip freeze

Testing:
- DEBT-005/006/007: comprehensive test suite added across tests/api/
- conftest: in-memory SQLite + StaticPool + monkeypatched session_factory
- fuzz mark added; default run excludes fuzz; -n logical parallelism

DEBT.md updated: 23/25 items closed; DEBT-011 (Alembic) and DEBT-023 (digest pinning) remain
2026-04-09 19:02:51 -04:00
29a2cf2738 refactor: modularize API routes into separate files and clean up dependencies 2026-04-09 11:58:57 -04:00