feat: add OpenTelemetry distributed tracing across all DECNET services

Gated by DECNET_DEVELOPER_TRACING env var (default off, zero overhead).
When enabled, traces flow through FastAPI routes, background workers
(collector, ingester, profiler, sniffer, prober), engine/mutator
operations, and all DB calls via TracedRepository proxy.

Includes Jaeger docker-compose for local dev and 18 unit tests.
This commit is contained in:
2026-04-15 23:23:13 -04:00
parent b437bc8eec
commit 65ddb0b359
14 changed files with 687 additions and 4 deletions

View File

@@ -50,6 +50,10 @@ async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
log.error("DB failed to initialize after 5 attempts — startup may be degraded")
await asyncio.sleep(0.5)
# Conditionally enable OpenTelemetry tracing
from decnet.telemetry import setup_tracing
setup_tracing(app)
# Start background tasks only if not in contract test mode
if os.environ.get("DECNET_CONTRACT_TEST") != "true":
# Start background ingestion task
@@ -99,6 +103,8 @@ async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
pass
except Exception as exc:
log.warning("Task shutdown error: %s", exc)
from decnet.telemetry import shutdown_tracing
shutdown_tracing()
log.info("API shutdown complete")