A module-level asyncio.Lock binds to the loop it was first awaited on.
Under pytest-anyio (and xdist) each test spins up a new loop; any later
test that hit /health or /config would wait on a lock owned by a dead
loop and the whole worker would hang.
Create the lock on first use and drop it in the test-reset helpers so a
fresh loop always gets a fresh lock.
Under CPU saturation the sync docker.from_env()/ping() calls could miss
their socket timeout, cache _docker_healthy=False, and return 503 for
the full 5s TTL window. Both calls now run on a thread so the event
loop keeps serving other requests while Docker is being probed.
stdlib json was FastAPI's default. Every response body, every SSE frame,
and every add_log/state/payload write paid the stdlib encode cost.
- pyproject.toml: add orjson>=3.10 as a core dep.
- decnet/web/api.py: default_response_class=ORJSONResponse on the
FastAPI app, so every endpoint return goes through orjson without
touching call sites. Explicit JSONResponse sites in the validation
exception handlers migrated to ORJSONResponse for consistency.
- health endpoint's explicit JSONResponse → ORJSONResponse.
- SSE stream (api_stream_events.py): 6 json.dumps call sites →
orjson.dumps(...).decode() — the per-event frames that fire on every
sse tick.
- sqlmodel_repo.py: encode sites on the log-insert path switched to
orjson (fields, payload, state value). Parser sites (json.loads)
left as-is for now — not on the measured hot path.
Locust hit /health and /config on every @task(3), so each request was
firing repo.get_total_logs() and two repo.get_state() calls against
aiosqlite — filling the driver queue for data that changes on the order
of seconds, not milliseconds.
Both caches follow the shape already used by the existing Docker cache:
- asyncio.Lock with double-checked TTL so concurrent callers collapse
into one DB hit per 1s window.
- _reset_* helpers called from tests/api/conftest.py::setup_db so the
module-level cache can't leak across tests.
tests/test_health_config_cache.py asserts 50 concurrent callers
produce exactly 1 repo call, and the cache expires after TTL.
Creating a new docker.from_env() client per /health request opened a
fresh unix-socket connection each time. Under load that's wasteful and
hammers dockerd.
Keep a module-level client + last-check timestamp; actually ping every
5 seconds, return cached state in between. Reset helper provided for
tests.
- Add 403 response to all RBAC-gated endpoints (schemathesis UndefinedStatusCode)
- Add 400 response to all endpoints accepting JSON bodies (malformed input)
- Add required 'title' field to schemathesis.toml for schemathesis 4.15+
- Add xdist_group markers to live tests with module-scoped fixtures to
prevent xdist from distributing them across workers (fixture isolation)
Extends tracing to every remaining module: all 23 API route handlers,
correlation engine, sniffer (fingerprint/p0f/syslog), prober (jarm/hassh/tcpfp),
profiler behavioral analysis, logging subsystem, engine, and mutator.
Bridges the ingester→SSE trace gap by persisting trace_id/span_id columns on
the logs table and creating OTEL span links in the SSE endpoint. Adds log-trace
correlation via _TraceContextFilter injecting otel_trace_id into Python LogRecords.
Includes development/docs/TRACING.md with full span reference (76 spans),
pipeline propagation architecture, quick start guide, and troubleshooting.