Commit Graph

7 Commits

Author SHA1 Message Date
4ea1c2ff4f fix(health): move Docker client+ping off the event loop
Under CPU saturation the sync docker.from_env()/ping() calls could miss
their socket timeout, cache _docker_healthy=False, and return 503 for
the full 5s TTL window. Both calls now run on a thread so the event
loop keeps serving other requests while Docker is being probed.
2026-04-17 15:43:51 -04:00
32340bea0d perf: migrate hot-path JSON serialization to orjson
stdlib json was FastAPI's default. Every response body, every SSE frame,
and every add_log/state/payload write paid the stdlib encode cost.

- pyproject.toml: add orjson>=3.10 as a core dep.
- decnet/web/api.py: default_response_class=ORJSONResponse on the
  FastAPI app, so every endpoint return goes through orjson without
  touching call sites. Explicit JSONResponse sites in the validation
  exception handlers migrated to ORJSONResponse for consistency.
- health endpoint's explicit JSONResponse → ORJSONResponse.
- SSE stream (api_stream_events.py): 6 json.dumps call sites →
  orjson.dumps(...).decode() — the per-event frames that fire on every
  sse tick.
- sqlmodel_repo.py: encode sites on the log-insert path switched to
  orjson (fields, payload, state value). Parser sites (json.loads)
  left as-is for now — not on the measured hot path.
2026-04-17 15:07:28 -04:00
f1e14280c0 perf: 1s TTL cache for /health DB probe and /config state reads
Locust hit /health and /config on every @task(3), so each request was
firing repo.get_total_logs() and two repo.get_state() calls against
aiosqlite — filling the driver queue for data that changes on the order
of seconds, not milliseconds.

Both caches follow the shape already used by the existing Docker cache:
- asyncio.Lock with double-checked TTL so concurrent callers collapse
  into one DB hit per 1s window.
- _reset_* helpers called from tests/api/conftest.py::setup_db so the
  module-level cache can't leak across tests.

tests/test_health_config_cache.py asserts 50 concurrent callers
produce exactly 1 repo call, and the cache expires after TTL.
2026-04-17 15:05:18 -04:00
931f33fb06 perf: cache Docker daemon ping in /health (5s TTL)
Creating a new docker.from_env() client per /health request opened a
fresh unix-socket connection each time. Under load that's wasteful and
hammers dockerd.

Keep a module-level client + last-check timestamp; actually ping every
5 seconds, return cached state in between. Reset helper provided for
tests.
2026-04-17 15:01:53 -04:00
89099b903d fix: resolve schemathesis and live test failures
- Add 403 response to all RBAC-gated endpoints (schemathesis UndefinedStatusCode)
- Add 400 response to all endpoints accepting JSON bodies (malformed input)
- Add required 'title' field to schemathesis.toml for schemathesis 4.15+
- Add xdist_group markers to live tests with module-scoped fixtures to
  prevent xdist from distributing them across workers (fixture isolation)
2026-04-16 01:39:04 -04:00
70d8ffc607 feat: complete OTEL tracing across all services with pipeline bridge and docs
Extends tracing to every remaining module: all 23 API route handlers,
correlation engine, sniffer (fingerprint/p0f/syslog), prober (jarm/hassh/tcpfp),
profiler behavioral analysis, logging subsystem, engine, and mutator.

Bridges the ingester→SSE trace gap by persisting trace_id/span_id columns on
the logs table and creating OTEL span links in the SSE endpoint. Adds log-trace
correlation via _TraceContextFilter injecting otel_trace_id into Python LogRecords.

Includes development/docs/TRACING.md with full span reference (76 spans),
pipeline propagation architecture, quick start guide, and troubleshooting.
2026-04-16 00:58:08 -04:00
a2ba7a7f3c feat: add /health endpoint for microservice monitoring
Checks database, background workers (ingestion, collector, attacker,
sniffer), and Docker daemon. Reports healthy/degraded/unhealthy status
with per-component details. Returns 503 when required services fail,
200 for healthy or degraded (only optional services down).
2026-04-14 16:56:20 -04:00