Performance Story
A war story. Not a spec sheet. If you want the knobs, see Tracing and Profiling. If you want the env vars, see Environment variables.
DECNET is a honeypot. Honeypots get hammered. If the ingest path melts under load, we lose attacker data — which is the only thing we care about. This page is the story of how we got the API from "falls over at 200 users" to "holds 3.3k RPS at 1500 concurrent users" and what that cost in blood.
All numbers below are real. They come from the nine Locust CSVs in
development/profiles/. No fabrication.
Headline table
All runs hit the same FastAPI surface (/api/v1/logs, /healthz,
/api/v1/attackers, etc.) via Locust. The Aggregated row is what
matters.
| Profile | Users | Config | Requests | Fails | p50 (ms) | p95 (ms) | p99 (ms) | Avg (ms) | RPS |
|---|---|---|---|---|---|---|---|---|---|
profile_3106d0313507f016_locust.csv |
baseline | early code, tracing on | 7 410 | 20 | 740 | 87 000 | 187 000 | 12 999.71 | 5.5 |
profile_255c2e5.csv |
mid | regression, tracing on | 1 042 | 514 | 6 700 | 150 000 | 186 000 | 58 835.59 | 2.3 |
profile_2dd86fb.csv |
mid | tracing on, post-fix | 6 012 | 0 | 240 | 134 000 | 194 000 | 16 217.04 | 2.4 |
profile_e967aaa.csv |
~1000 | tracing on, cleanups | 259 381 | 0 | 300 | 1 600 | 2 200 | 514.41 | 934.3 |
profile_fb69a06.csv |
~1000 | tracing on, tuned | 396 672 | 0 | 100 | 1 900 | 2 900 | 465.03 | 963.6 |
profile_1500_fb69a06.csv |
1500 | tracing ON | 232 648 | 0 | 690 | 6 500 | 9 500 | 1 773.51 | 880.4 |
profile_1500_notracing_fb69a06.csv |
1500 | tracing OFF | 277 214 | 0 | 340 | 5 700 | 8 400 | 1 489.08 | 992.7 |
profile_1500_notracing_12_workers_fb69a06.csv |
1500 | tracing OFF, 12 uvicorn workers | 308 024 | 0 | 700 | 2 700 | 4 200 | 929.88 | 1 585.1 |
profile_1500_notracing_single_core_fb69a06.csv |
1500 | tracing OFF, single core pin | 3 532 | 0 | 270 | 115 000 | 122 000 | 21 728.92 | 46.2 |
(p50/p95/p99 = Locust Median / 95%ile / 99%ile columns. RPS = Current RPS at end of the run.)
1. The baseline: "it works, on Tuesdays"
The earliest usable profile is profile_3106d0313507f016_locust.csv.
7 410 requests, 20 failures, and a p99 of 187 seconds. You read
that right — the 99th percentile request took over three minutes to
come back. Current RPS at end of run: 5.5.
We were not fast. We were not even slow in a respectable way.
profile_255c2e5.csv is worse: 1 042 requests, 514 failed (49%
failure rate), p99 = 186 s, average 58.8 s per request. That is the
regression that proved our API could lock itself up completely when
everyone tried to write at once.
profile_2dd86fb.csv was the patch that stopped the bleeding: zero
failures, but still p95/p99 in the 100–200 s range. The API responded
to every request, eventually. That is not what anyone means by
"responded."
2. The turnaround: e967aaa and fb69a06
Then two commits changed everything.
profile_e967aaa.csv: 259 381 requests, zero failures, p50=300 ms,
p95=1.6 s, p99=2.2 s, average 514 ms, 934 RPS. Two orders of
magnitude better on tail latency, four orders of magnitude more
requests serviced.
profile_fb69a06.csv squeezed more out: 396 672 requests, zero
failures, p50=100 ms, p95=1.9 s, p99=2.9 s, average 465 ms, 963
RPS. This is the commit we pinned as our "healthy baseline." Every
1500-user run below is tagged _fb69a06 because we wanted to measure
load and config, not code churn.
How? The usual suspects: proper DB connection pooling, eliminated a
hot-path N+1, switched the repository layer to the injected
get_repository() / get_repo pattern (see CLAUDE.md's DI rule), and
stopped synchronously fsync'ing on every insert.
3. 1500 users: the API holds
profile_1500_fb69a06.csv turns the screws: 1500 concurrent users,
tracing ON, default uvicorn worker count. Result: 232 648 requests,
zero failures, p50=690 ms, p95=6.5 s, p99=9.5 s, 880 RPS.
Zero failures at 1500 users is the first genuine win. Latency got uglier — p95 jumped from 1.9 s to 6.5 s — but nothing fell over. The system is now throughput-limited, not stability-limited. That is a different class of problem.
4. What OpenTelemetry cost us
Compare profile_1500_fb69a06.csv vs profile_1500_notracing_fb69a06.csv.
Same code, same load, same host. Only difference:
DECNET_DEVELOPER_TRACING=false.
| Metric | Tracing ON | Tracing OFF | Delta |
|---|---|---|---|
| Total requests | 232 648 | 277 214 | +19% |
| p50 | 690 ms | 340 ms | -51% |
| p95 | 6 500 ms | 5 700 ms | -12% |
| p99 | 9 500 ms | 8 400 ms | -12% |
| Avg | 1 773 ms | 1 489 ms | -16% |
| RPS | 880.4 | 992.7 | +13% |
Auto-instrumented FastAPI tracing is not free. The median request paid a ~350 ms tax and the API served ~20% fewer requests in the same window. Tails are less affected because they are dominated by I/O wait, not span overhead.
Rule: tracing stays off in production DECNET deployments. It is a development lens. See Tracing and Profiling.
5. Vertical scaling: 12 workers vs single core
profile_1500_notracing_12_workers_fb69a06.csv: tracing off, uvicorn
with 12 workers. Result: 308 024 requests, p50=700 ms, p95=2.7
s, p99=4.2 s, 1 585 RPS.
Going from default workers to 12 bought us: +11% throughput, -53% p95, -50% p99. The tail improvement is the real prize — more workers means fewer requests queued behind a slow one.
Now the punchline: profile_1500_notracing_single_core_fb69a06.csv.
Same config, pinned to one core via CPU affinity. Result: 3 532
requests total, p95=115 s, p99=122 s, average 21.7 s, 46
RPS.
Single-core is a 34x throughput collapse vs 12-workers, and the tail grows from 4 seconds to nearly two minutes. FastAPI + SQLite on one core with 1500 concurrent clients is a queue, not a server.
Vertical scaling holds. Horizontal workers matter. The GIL is real.
6. Where is the bottleneck now?
Reading the 12-worker numbers: 1 585 RPS, p95=2.7 s, with zero failures. That is good, but p95 should be far lower than 2.7 s for an in-memory-ish workload. Candidates:
- SQLite single-writer lock. All 12 workers share one
attackers.db. SQLite's WAL mode helps readers but writes still serialize. Under/api/v1/logswrite amplification we expect queue-behind-writer stalls in exactly this latency envelope. The MySQL backend exists for exactly this reason — see Database drivers. - Python GIL on the aggregation hot path. The single-core profile proves the interpreter is CPU-bound at saturation. 12 workers side- step the GIL only for independent requests — anything going through a shared lock (DB, in-process cache) re-serializes.
- Network stack / event-loop wait on Locust side — less likely, we checked client CPU during the runs.
Best defensible guess: SQLite writer lock first, GIL second.
Switching the hot-write path to MySQL (or even PRAGMA journal_mode=WAL + batched inserts) should move p95 under a second at
the same RPS. That work is scoped but not landed. See
development/FUTURE.md for the queue.
tl;dr
- From 5 RPS / 49% failure to 1 585 RPS / 0% failure at 1500 concurrent users.
- Tracing costs ~13% RPS and doubles p50. Keep it off in production.
- Workers matter. Single-core pinning = 46 RPS and two-minute tails.
- Next bottleneck: the single SQLite writer. Blame the database, as is tradition.
Related: Design overview · Logging · Tracing and Profiling · Testing and CI.
DECNET
User docs
- Quick-Start
- Installation
- Requirements-and-Python-Versions
- CLI-Reference
- INI-Config-Format
- Custom-Services
- Services-Catalog
- Service-Personas
- Archetypes
- Distro-Profiles
- OS-Fingerprint-Spoofing
- Networking-MACVLAN-IPVLAN
- Deployment-Modes
- SWARM-Mode
- MazeNET
- Remote-Updates
- Environment-Variables
- Teardown-and-State
- Database-Drivers
- Systemd-Setup
- Logging-and-Syslog
- Service-Bus
- Web-Dashboard
- REST-API-Reference
- Mutation-and-Randomization
- Troubleshooting
Developer docs
DECNET — honeypot deception-network framework. Pre-1.0, active development — use with caution. See Sponsors to support the project. Contact: samuel@securejump.cl