1
Tracing and Profiling
anti edited this page 2026-04-18 06:08:46 -04:00

Tracing and Profiling

Three lenses. All off by default. All cost real CPU. Turn on, measure, turn off. For the perf history that justifies the defaults, see Performance Story. For every env var on this page, see Environment variables.


1. OpenTelemetry (distributed tracing)

Auto-instruments FastAPI, the internal workers, and outbound DB/HTTP calls. Emits OTLP over gRPC.

Enable

export DECNET_DEVELOPER_TRACING=true
export DECNET_OTEL_ENDPOINT=http://localhost:4317   # default

Bring up a local collector

A dev collector ships in the repo:

cd development
docker compose -f docker-compose.otel.yml up -d
# now point DECNET at it (default endpoint already matches)
DECNET_DEVELOPER_TRACING=true decnet deploy --mode unihost --deckies 2

Traces appear in whatever backend the compose file wires to (otel-collector → Jaeger/Tempo, depending on your compose overrides).

Install the extras

Tracing is an optional dependency group:

pip install -e '.[tracing]'
# or the umbrella:
pip install -e '.[dev]'

Without the extras, DECNET_DEVELOPER_TRACING=true is a no-op — the telemetry module detects missing packages and degrades silently. That behaviour is covered by tests/test_telemetry.py (TestTracingDisabled).

Cost

Measured, not guessed. From development/profiles/:

  • 1500 concurrent users, tracing ON: p50=690 ms, 880 RPS
  • same load, tracing OFF: p50=340 ms, 993 RPS

That is roughly a ~13% RPS hit and p50 doubles. Do not ship production deployments with this flag on. Full breakdown in Performance Story §4.


2. Pyinstrument (per-request flamegraphs)

A sampling profiler mounted as ASGI middleware. Each request gets its own HTML flamegraph.

Enable

export DECNET_PROFILE_REQUESTS=true
export DECNET_PROFILE_DIR=profiles     # default, relative to $PWD
DECNET_PROFILE_REQUESTS=true decnet deploy --mode unihost --deckies 1

Hit an endpoint, then:

ls profiles/            # one *.html per request
xdg-open profiles/<most-recent>.html

Install the extras

pip install -e '.[profile]'

Cost

Pyinstrument samples at kHz frequencies and writes HTML per request. That is non-trivial CPU and disk pressure. Do not leave it on. It is a "one request at a time, look at it under a microscope" tool, not a production observability channel.


3. py-spy / memray / snakeviz (dev extras)

Optional profilers declared in pyproject.toml under the profile extras group. Install with pip install -e '.[profile]'.

py-spy (sampling profiler, no code changes)

# find the API pid
ps -ef | grep 'uvicorn.*decnet'

# record a 30s flamegraph
sudo py-spy record -o pyspy.svg --pid <pid> --duration 30

There is a helper at scripts/profile/pyspy-attach.sh that documents the common invocations. If py-spy cannot attach (Linux ptrace restrictions), use Pyinstrument instead — the script prints fallback guidance.

memray (memory allocation tracing)

scripts/profile/memray-api.sh            # launches API under memray
memray flamegraph memray.bin             # generate report

Good for hunting leaks in the long-running workers (collector, attacker-worker, sniffer).

snakeviz (cProfile viewer)

python -m cProfile -o out.prof -m decnet.cli services
snakeviz out.prof

Useful for CLI / one-shot paths. For the API, prefer Pyinstrument.


Production warning

Do not leave DECNET_DEVELOPER_TRACING=true or DECNET_PROFILE_REQUESTS=true on in a real DECNET deployment.

The numbers in Performance Story are measured on a single baseline commit (fb69a06) with every other variable held still, and tracing alone cost ~13% throughput and doubled p50. In an attacker-facing honeypot, a slow API means delayed detection, missed correlations, and potentially a noticeable behavioural tell — none of which you want.

Rule: flip on, collect, flip off, commit.

Related: Environment variables · Design overview · Logging · Testing and CI.