fix(stress): unblock Locust runs from login rate-limit self-DoS

Locust spawns N virtual users (default 1000), all from 127.0.0.1 as
admin. /auth/login is rate-limited 10/5min per-IP AND per-username, so
the 11th on_start() got 429 and a RuntimeError. A @task(2) login in
the task weights turned the whole run into a 429 factory even after
ramp-up. And _login_with_retry treated 429 as non-retryable, so there
was no graceful degradation path.

Three changes, one root cause:

- decnet/web/limiter.py: read DECNET_LIMITER_ENABLED (default true).
  When false, slowapi's Limiter(enabled=False) makes @limiter.limit a
  no-op. Default ships unchanged; nobody should ever release with this
  off.
- tests/stress/conftest.py: set DECNET_LIMITER_ENABLED=false in the
  uvicorn subprocess env. Stress tests measure throughput, not rate
  limiting.
- tests/stress/locustfile.py: drop the @task(2) login — it added zero
  coverage (every user already logs in at on_start) and only generated
  contention. Teach _login_with_retry to honour 429 + Retry-After so a
  Locust pointed at a limiter-enabled server degrades gracefully
  instead of crashing on_start.
This commit is contained in:
2026-04-24 00:13:15 -04:00
parent ae92948e22
commit d61e143b71
3 changed files with 44 additions and 7 deletions

View File

@@ -72,6 +72,12 @@ def stress_server():
"DECNET_DEVELOPER_TRACING": "false",
"DECNET_DB_TYPE": "sqlite",
"DECNET_MODE": "master",
# Locust hammers /auth/login from a single host as a single
# user — the production 10/5min per-IP + per-user limits would
# kill ramp-up past the 11th virtual user. Stress tests are
# measuring throughput, not rate-limiting; disable in this
# subprocess only.
"DECNET_LIMITER_ENABLED": "false",
})
proc = subprocess.Popen(
[