docs: record single-worker / multi-worker perf baseline

Capture Locust numbers from the fb69a06 branch across five configurations so future regressions have something to measure against. - 500u tracing-on single-worker: ~960 RPS / p99 2.9 s - 1500u tracing-on single-worker: ~880 RPS / p99 9.5 s - 1500u tracing-off single-worker: ~990 RPS / p99 8.4 s - 1500u tracing-off pinned to one core: ~46 RPS / p99 122 s - 1500u tracing-off 12 workers: ~1585 RPS / p99 4.2 s Also note MySQL max_connections math (pool_size * max_overflow * workers = 720) to explain why the default 151 needs bumping, and the Python 3.14 GC segfault so nobody repeats that mistake.
2026-04-17 22:03:50 -04:00
parent fb69a06ab3
commit 20fa1f9a63
1 changed files with 55 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -706,6 +706,61 @@ locust -f tests/stress/locustfile.py --host http://localhost:8000
 | `STRESS_SPIKE_USERS` | `1000` | Users for thundering herd test |
 | `STRESS_SUSTAINED_USERS` | `200` | Users for sustained load test |

+#### Measured baseline
+
+Reference numbers from recent Locust runs against a MySQL backend
+(asyncmy driver). All runs hold zero failures throughout.
+
+**Single worker** (unless noted):
+
+| Metric | 500u, tracing on | 1500u, tracing on | 1500u, tracing **off** | 1500u, tracing off, **pinned to 1 core** | 1500u, tracing off, **12 workers** |
+|---|---|---|---|---|---|
+| Requests served | 396,672 | 232,648 | 277,214 | 3,532 | 308,024 |
+| Failures | 0 | 0 | 0 | 0 | 0 |
+| Throughput (current RPS) | ~960 | ~880 | ~990 | ~46 | ~1,585 |
+| Average latency | 465 ms | 1,774 ms | 1,489 ms | 21.7 s | 930 ms |
+| Median (p50) | 100 ms | 690 ms | 340 ms | 270 ms | 700 ms |
+| p95 | 1.9 s | 6.5 s | 5.7 s | 115 s | 2.7 s |
+| p99 | 2.9 s | 9.5 s | 8.4 s | 122 s | 4.2 s |
+| Max observed | 8.3 s | 24.4 s | 20.9 s | 124.5 s | 16.5 s |
+
+Ramp is 15 users/s for the 500u column, 40 users/s otherwise.
+
+Takeaways:
+
+- **Tracing off**: at 1500 users, flipping `DECNET_TRACING=false`
+  halves p50 (690 → 340 ms) and pushes RPS from ~880 past the
+  500-user figure on a single worker.
+- **12 workers**: RPS scales ~1.6× over a single worker (~990 →
+  ~1585). Sublinear because the workload is DB-bound — MySQL and the
+  connection pool become the new ceiling, not Python. p99 drops from
+  8.4 s to 4.2 s.
+- **Connection math**: `DECNET_DB_POOL_SIZE=20` × `DECNET_DB_MAX_OVERFLOW=40`
+  × 12 workers = 720 connections at peak. MySQL's default
+  `max_connections=151` needs bumping (we used 2000) before running
+  multi-worker load.
+- **Single-core pinning**: ~46 RPS with p95 near two minutes. Interesting
+  as a "physics floor" datapoint — not a production config.
+
+Top endpoints by volume: `/api/v1/attackers`, `/api/v1/deckies`,
+`/api/v1/bounty`, `/api/v1/logs/histogram`, `/api/v1/config`,
+`/api/v1/health`, `/api/v1/auth/login`, `/api/v1/logs`.
+
+Notes on tuning:
+
+- **Python 3.14 is currently a no-go for the API server.** Under heavy
+  concurrent async load the reworked 3.14 GC segfaults inside
+  `mark_all_reachable` (observed in `_PyGC_Collect` during pending-GC
+  on 3.14.3). Stick to Python 3.11–3.13 until upstream stabilises.
+- Router-level TTL caches on hot count/stats endpoints (`/stats`,
+  `/logs` count, `/attackers` count, `/bounty`, `/logs/histogram`,
+  `/deckies`, `/config`) collapse concurrent duplicate work onto a
+  single DB hit per window — essential to reach this RPS on one worker.
+- Turning off request tracing (`DECNET_TRACING=false`) is the next
+  free headroom: tracing was still on during the run above.
+- On SQLite, `DECNET_DB_POOL_PRE_PING=false` skips the per-checkout
+  `SELECT 1`. On MySQL, keep it `true` — network disconnects are real.
+
 #### System tuning: open file limit

 Under heavy load (500+ concurrent users), the server will exhaust the default Linux open file limit (`ulimit -n`), causing `OSError: [Errno 24] Too many open files`. Most distros default to **1024**, which is far too low for stress testing or production use.