fix(security): close LOW ASVS findings — env bypass, SSE/deployment authz, CN fail-close, password byte-limit, exception leaks, BUG-12..16
Auth/session (V2.1.7, V4.1.5, V4.1.6, V2.1.4/V2.1.5): - env secret validation no longer bypassed by attacker-injectable PYTEST* env; gated on explicit DECNET_TESTING=1 (set only in conftest). - must_change_password now enforced on the SSE header-JWT path, not just ticket mint. - GET /system/deployment-mode requires viewer auth (was leaking role + topology size). - CreateUser/ResetUser passwords min_length=12; passwords >72 bytes rejected explicitly instead of bcrypt silently truncating. Swarm ingestion (V9.1.3, BUG-16): - Log listener hard-rejects peers with unparseable/empty cert CN (fail closed, ingests nothing) instead of tagging 'unknown'. - Shutdown handlers no longer swallow real errors (narrowed to CancelledError). Info leakage (V7.1.2, V14.1.2): - Exception text sanitized on swarm-update, health, tarpit, realism, file-drop, blank-topology endpoints (raw tc/docker stderr, DB/Docker errors logged server-side, generic detail returned). pyproject license corrected to AGPL-3.0. Correctness (BUG-12..16): - BUG-12 atomic credential upsert (UNIQUE constraint + IntegrityError retry, consistent principal_key canonicalization). - BUG-13 rule-tail watermark uses >= with seen-id dedup (no same-second drop). - BUG-14 worker wake cleared before wait (no lost wake during tick). - BUG-15 intel gather tolerates an unexpected provider raise. - BUG-16 see above. Already-closed (verified, no change): V2.1.6, V5.1.3, V9.1.2. Accept-risk + documented: V2.1.8 cache window, V3.1.3 idle timeout. Tests added for every fix; unanimous adversarial review after two refute-fix rounds.
This commit is contained in:
@@ -181,3 +181,117 @@ async def test_clusterer_registered_in_cli():
|
||||
"""`decnet clusterer` is registered as a master-only command."""
|
||||
from decnet.cli.gating import MASTER_ONLY_COMMANDS
|
||||
assert "clusterer" in MASTER_ONLY_COMMANDS
|
||||
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_wake_during_tick_is_not_lost(repo):
|
||||
"""BUG-14 regression: wake.clear() must run BEFORE wake.wait(), not after.
|
||||
|
||||
The worker loop pattern:
|
||||
|
||||
Fixed (after fix): tick → clear → wait ← current code
|
||||
Buggy (before fix): tick → wait → clear
|
||||
|
||||
The race in the buggy pattern: a wake.set() could arrive from a _wake_on
|
||||
background task between wait() returning and clear() executing. In asyncio
|
||||
the task switch requires an ``await``; the original code had
|
||||
``await _publish_result`` between wait() and clear(), providing a real
|
||||
window. The fix closes this by moving clear() to run immediately after
|
||||
tick (before wait), so there is no window between wait() returning and the
|
||||
next clear().
|
||||
|
||||
**Structural test (red-before / green-after, deterministic):**
|
||||
We intercept the internal ``wake`` event's ``clear()`` and ``wait()``
|
||||
methods to record their invocation order, then assert that every
|
||||
``clear`` call is immediately followed by a ``wait`` (never preceded by
|
||||
one). Reverting the worker to the buggy ``wait → clear`` order produces
|
||||
``("wait", "clear")`` consecutive pairs, which the assertion catches
|
||||
deterministically without relying on wall-clock timing races.
|
||||
"""
|
||||
import unittest.mock as _mock
|
||||
|
||||
captured_wake: list[asyncio.Event] = []
|
||||
call_log: list[str] = []
|
||||
|
||||
_orig_event_cls = asyncio.Event
|
||||
|
||||
class _LoggingEvent(_orig_event_cls): # type: ignore[misc]
|
||||
"""Subclass captures the first event created (the wake event) and
|
||||
logs clear()/wait() calls so we can verify their relative order."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
super().__init__()
|
||||
if not captured_wake:
|
||||
captured_wake.append(self)
|
||||
|
||||
def clear(self) -> None:
|
||||
if captured_wake and self is captured_wake[0]:
|
||||
call_log.append("clear")
|
||||
super().clear()
|
||||
|
||||
async def wait(self) -> bool: # type: ignore[override]
|
||||
if captured_wake and self is captured_wake[0]:
|
||||
call_log.append("wait")
|
||||
return await super().wait()
|
||||
|
||||
class _SimpleTicker(Clusterer):
|
||||
name = "bug14_ticker"
|
||||
|
||||
async def tick(self, _repo) -> ClusterResult:
|
||||
return ClusterResult()
|
||||
|
||||
shutdown = asyncio.Event()
|
||||
|
||||
with _mock.patch("decnet.clustering.worker.asyncio.Event", _LoggingEvent):
|
||||
task = asyncio.create_task(
|
||||
run_clusterer_loop(
|
||||
repo,
|
||||
poll_interval_secs=0.1,
|
||||
clusterer=_SimpleTicker(),
|
||||
shutdown=shutdown,
|
||||
)
|
||||
)
|
||||
# Run for long enough to accumulate several clear/wait pairs.
|
||||
await asyncio.sleep(0.5)
|
||||
shutdown.set()
|
||||
await asyncio.wait_for(task, timeout=2.0)
|
||||
|
||||
# Must have enough events to be meaningful.
|
||||
assert len(call_log) >= 4, (
|
||||
f"BUG-14: too few clear/wait calls recorded ({call_log!r}); "
|
||||
"loop may not have run or event capture failed"
|
||||
)
|
||||
|
||||
# Fixed invariant: every loop iteration runs clear() BEFORE wait().
|
||||
# The resulting call_log for the fixed code is: clear, wait, clear, wait, ...
|
||||
# For the buggy code (wait → clear) the log would be: wait, clear, wait, clear, ...
|
||||
#
|
||||
# We verify TWO conditions that together guarantee the fixed order:
|
||||
#
|
||||
# 1. The log starts with "clear" — the very first thing after tick is a clear.
|
||||
# Buggy code starts with "wait" (wait ran first in the original loop).
|
||||
#
|
||||
# 2. Within each (clear, wait) pair at positions (2k, 2k+1), clear always
|
||||
# precedes wait. We check that no "clear" appears at an ODD index (which
|
||||
# would mean a clear followed another clear, or a wait appeared before
|
||||
# the next clear).
|
||||
assert call_log[0] == "clear", (
|
||||
f"BUG-14 regression: first wake call was {call_log[0]!r}, expected 'clear'. "
|
||||
f"Full log: {call_log!r}. The worker is using the buggy 'wait-then-clear' "
|
||||
"order — wake.clear() must execute BEFORE wake.wait() each iteration."
|
||||
)
|
||||
# Verify the alternating pattern holds: indices 0,2,4,... should be "clear"
|
||||
# and indices 1,3,5,... should be "wait".
|
||||
for idx, call in enumerate(call_log):
|
||||
if idx % 2 == 0:
|
||||
assert call == "clear", (
|
||||
f"BUG-14 regression: expected 'clear' at position {idx} but got "
|
||||
f"{call!r} in log {call_log!r}. This indicates the loop is NOT "
|
||||
"using the fixed 'clear → wait' order within each iteration."
|
||||
)
|
||||
else:
|
||||
assert call == "wait", (
|
||||
f"BUG-14 regression: expected 'wait' at position {idx} but got "
|
||||
f"{call!r} in log {call_log!r}. This indicates the loop is NOT "
|
||||
"using the fixed 'clear → wait' order within each iteration."
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user