Table of Contents

Performance Story

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Performance Story

A war story. Not a spec sheet. If you want the knobs, see Tracing and Profiling. If you want the env vars, see Environment variables.

DECNET is a honeypot. Honeypots get hammered. If the ingest path melts under load, we lose attacker data — which is the only thing we care about. This page is the story of how we got the API from "falls over at 200 users" to "holds 3.3k RPS at 1500 concurrent users" and what that cost in blood.

All numbers below are real. They come from the nine Locust CSVs in development/profiles/. No fabrication.

Headline table

All runs hit the same FastAPI surface (/api/v1/logs, /healthz, /api/v1/attackers, etc.) via Locust. The Aggregated row is what matters.

Profile	Users	Config	Requests	Fails	p50 (ms)	p95 (ms)	p99 (ms)	Avg (ms)	RPS
`profile_3106d0313507f016_locust.csv`	baseline	early code, tracing on	7 410	20	740	87 000	187 000	12 999.71	5.5
`profile_255c2e5.csv`	mid	regression, tracing on	1 042	514	6 700	150 000	186 000	58 835.59	2.3
`profile_2dd86fb.csv`	mid	tracing on, post-fix	6 012	0	240	134 000	194 000	16 217.04	2.4
`profile_e967aaa.csv`	~1000	tracing on, cleanups	259 381	0	300	1 600	2 200	514.41	934.3
`profile_fb69a06.csv`	~1000	tracing on, tuned	396 672	0	100	1 900	2 900	465.03	963.6
`profile_1500_fb69a06.csv`	1500	tracing ON	232 648	0	690	6 500	9 500	1 773.51	880.4
`profile_1500_notracing_fb69a06.csv`	1500	tracing OFF	277 214	0	340	5 700	8 400	1 489.08	992.7
`profile_1500_notracing_12_workers_fb69a06.csv`	1500	tracing OFF, 12 uvicorn workers	308 024	0	700	2 700	4 200	929.88	1 585.1
`profile_1500_notracing_single_core_fb69a06.csv`	1500	tracing OFF, single core pin	3 532	0	270	115 000	122 000	21 728.92	46.2

(p50/p95/p99 = Locust Median / 95%ile / 99%ile columns. RPS = Current RPS at end of the run.)

1. The baseline: "it works, on Tuesdays"

The earliest usable profile is profile_3106d0313507f016_locust.csv. 7 410 requests, 20 failures, and a p99 of 187 seconds. You read that right — the 99th percentile request took over three minutes to come back. Current RPS at end of run: 5.5.

We were not fast. We were not even slow in a respectable way.

profile_255c2e5.csv is worse: 1 042 requests, 514 failed (49% failure rate), p99 = 186 s, average 58.8 s per request. That is the regression that proved our API could lock itself up completely when everyone tried to write at once.

profile_2dd86fb.csv was the patch that stopped the bleeding: zero failures, but still p95/p99 in the 100–200 s range. The API responded to every request, eventually. That is not what anyone means by "responded."

2. The turnaround: `e967aaa` and `fb69a06`

Then two commits changed everything.

profile_e967aaa.csv: 259 381 requests, zero failures, p50=300 ms, p95=1.6 s, p99=2.2 s, average 514 ms, 934 RPS. Two orders of magnitude better on tail latency, four orders of magnitude more requests serviced.

profile_fb69a06.csv squeezed more out: 396 672 requests, zero failures, p50=100 ms, p95=1.9 s, p99=2.9 s, average 465 ms, 963 RPS. This is the commit we pinned as our "healthy baseline." Every 1500-user run below is tagged _fb69a06 because we wanted to measure load and config, not code churn.

How? The usual suspects: proper DB connection pooling, eliminated a hot-path N+1, switched the repository layer to the injected get_repository() / get_repo pattern (see CLAUDE.md's DI rule), and stopped synchronously fsync'ing on every insert.

3. 1500 users: the API holds

profile_1500_fb69a06.csv turns the screws: 1500 concurrent users, tracing ON, default uvicorn worker count. Result: 232 648 requests, zero failures, p50=690 ms, p95=6.5 s, p99=9.5 s, 880 RPS.

Zero failures at 1500 users is the first genuine win. Latency got uglier — p95 jumped from 1.9 s to 6.5 s — but nothing fell over. The system is now throughput-limited, not stability-limited. That is a different class of problem.

4. What OpenTelemetry cost us

Compare profile_1500_fb69a06.csv vs profile_1500_notracing_fb69a06.csv. Same code, same load, same host. Only difference: DECNET_DEVELOPER_TRACING=false.

Metric	Tracing ON	Tracing OFF	Delta
Total requests	232 648	277 214	+19%
p50	690 ms	340 ms	-51%
p95	6 500 ms	5 700 ms	-12%
p99	9 500 ms	8 400 ms	-12%
Avg	1 773 ms	1 489 ms	-16%
RPS	880.4	992.7	+13%

Auto-instrumented FastAPI tracing is not free. The median request paid a ~350 ms tax and the API served ~20% fewer requests in the same window. Tails are less affected because they are dominated by I/O wait, not span overhead.

Rule: tracing stays off in production DECNET deployments. It is a development lens. See Tracing and Profiling.

5. Vertical scaling: 12 workers vs single core

profile_1500_notracing_12_workers_fb69a06.csv: tracing off, uvicorn with 12 workers. Result: 308 024 requests, p50=700 ms, p95=2.7 s, p99=4.2 s, 1 585 RPS.

Going from default workers to 12 bought us: +11% throughput, -53% p95, -50% p99. The tail improvement is the real prize — more workers means fewer requests queued behind a slow one.

Now the punchline: profile_1500_notracing_single_core_fb69a06.csv. Same config, pinned to one core via CPU affinity. Result: 3 532 requests total, p95=115 s, p99=122 s, average 21.7 s, 46 RPS.

Single-core is a 34x throughput collapse vs 12-workers, and the tail grows from 4 seconds to nearly two minutes. FastAPI + SQLite on one core with 1500 concurrent clients is a queue, not a server.

Vertical scaling holds. Horizontal workers matter. The GIL is real.

6. Where is the bottleneck now?

Reading the 12-worker numbers: 1 585 RPS, p95=2.7 s, with zero failures. That is good, but p95 should be far lower than 2.7 s for an in-memory-ish workload. Candidates:

SQLite single-writer lock. All 12 workers share one attackers.db. SQLite's WAL mode helps readers but writes still serialize. Under /api/v1/logs write amplification we expect queue-behind-writer stalls in exactly this latency envelope. The MySQL backend exists for exactly this reason — see Database drivers.
Python GIL on the aggregation hot path. The single-core profile proves the interpreter is CPU-bound at saturation. 12 workers side- step the GIL only for independent requests — anything going through a shared lock (DB, in-process cache) re-serializes.
Network stack / event-loop wait on Locust side — less likely, we checked client CPU during the runs.

Best defensible guess: SQLite writer lock first, GIL second. Switching the hot-write path to MySQL (or even PRAGMA journal_mode=WAL + batched inserts) should move p95 under a second at the same RPS. That work is scoped but not landed. See development/FUTURE.md for the queue.

tl;dr

From 5 RPS / 49% failure to 1 585 RPS / 0% failure at 1500 concurrent users.
Tracing costs ~13% RPS and doubles p50. Keep it off in production.
Workers matter. Single-core pinning = 46 RPS and two-minute tails.
Next bottleneck: the single SQLite writer. Blame the database, as is tradition.

DECNET

Home

User docs

Developer docs

Project meta

DECNET — honeypot deception-network framework. Pre-1.0, active development — use with caution. See Sponsors to support the project. Contact: samuel@securejump.cl