fix(tests/stress): eliminate 0-request flakes in locust runs
Three independent issues conspired to make stress tests record 0 requests: 1. Every virtual user did /auth/login in on_start. With 1000 users in a spike window, bcrypt-bound logins never finished and on_start failed for all users — aggregated requests stayed at 0. Pre-fetch a single admin token in the fixture (cached per-host) and pass it via DECNET_STRESS_TOKEN so locust users skip the login storm. 2. Locust exits non-zero on any request failure by default, causing run_locust to throw away an otherwise valid stats CSV. Pass --exit-code-on-error 0 so per-test assertions are the only fail gate. 3. test_stress_sustained ran two locust subprocesses against the same uvicorn. Phase 1's keep-alive connections wedged phase 2 into 0 recorded requests ~2/3 of the time. Refactored stress_server into a start_stress_server() context manager and gave each phase its own uvicorn. Stable 3/3 on full suite, 3/3 on test_stress_sustained alone.
This commit is contained in:
@@ -9,7 +9,13 @@ import os
|
||||
|
||||
import pytest
|
||||
|
||||
from tests.stress.conftest import run_locust, STRESS_USERS, STRESS_SPAWN_RATE, STRESS_DURATION
|
||||
from tests.stress.conftest import (
|
||||
run_locust,
|
||||
start_stress_server,
|
||||
STRESS_USERS,
|
||||
STRESS_SPAWN_RATE,
|
||||
STRESS_DURATION,
|
||||
)
|
||||
|
||||
|
||||
# Assertion thresholds (overridable via env)
|
||||
@@ -114,43 +120,35 @@ def test_stress_spike(stress_server):
|
||||
|
||||
|
||||
@pytest.mark.stress
|
||||
def test_stress_sustained(stress_server):
|
||||
def test_stress_sustained():
|
||||
"""Sustained load: 200 users for 30s. Checks latency doesn't degrade >3x.
|
||||
|
||||
Runs two phases:
|
||||
Runs two phases against independent uvicorns. Sharing a server between
|
||||
phases leaks keep-alive connections from phase 1 into phase 2 and the
|
||||
sustained run records 0 requests roughly two-thirds of the time.
|
||||
1. Warm-up (10s) to get baseline latency
|
||||
2. Sustained (30s) to check for degradation
|
||||
"""
|
||||
sustained_users = int(os.environ.get("STRESS_SUSTAINED_USERS", "200"))
|
||||
|
||||
# Cap spawn rate at 100/s — locust itself warns above that and has been
|
||||
# observed to record 0 requests when the spawn storm collides with a
|
||||
# still-draining uvicorn from a prior phase.
|
||||
ramp = min(sustained_users, 100)
|
||||
|
||||
# Phase 1: warm-up baseline
|
||||
env_warmup = run_locust(
|
||||
host=stress_server,
|
||||
users=sustained_users,
|
||||
spawn_rate=ramp,
|
||||
duration=10,
|
||||
)
|
||||
with start_stress_server() as warm_url:
|
||||
env_warmup = run_locust(
|
||||
host=warm_url,
|
||||
users=sustained_users,
|
||||
spawn_rate=ramp,
|
||||
duration=10,
|
||||
)
|
||||
baseline_avg = env_warmup.stats.total.avg_response_time
|
||||
_print_stats(env_warmup, f"SUSTAINED warm-up: {sustained_users} users, 10s")
|
||||
|
||||
# Let the server drain pending work before firing the second locust run;
|
||||
# otherwise the first request in phase 2 can sit behind a queued backlog
|
||||
# and the 30s window can finish with 0 recorded requests.
|
||||
import time as _t
|
||||
_t.sleep(5)
|
||||
|
||||
# Phase 2: sustained
|
||||
env_sustained = run_locust(
|
||||
host=stress_server,
|
||||
users=sustained_users,
|
||||
spawn_rate=ramp,
|
||||
duration=30,
|
||||
)
|
||||
with start_stress_server() as sustained_url:
|
||||
env_sustained = run_locust(
|
||||
host=sustained_url,
|
||||
users=sustained_users,
|
||||
spawn_rate=ramp,
|
||||
duration=30,
|
||||
)
|
||||
sustained_avg = env_sustained.stats.total.avg_response_time
|
||||
_print_stats(env_sustained, f"SUSTAINED main: {sustained_users} users, 30s")
|
||||
|
||||
|
||||
Reference in New Issue
Block a user