feat(webhooks): circuit breaker auto-disables misbehaving subscriptions

After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed
deliveries, the worker calls trip_webhook_circuit(uuid, ts) which
flips enabled=False and stamps auto_disabled_at. The worker sets its
reload flag so the next dispatch epoch stops consuming events for the
tripped sub entirely — one dead receiver can't poison the shared
egress pool anymore.

Operator clears the trip via PATCH — setting enabled=True when the
sub was previously disabled clears auto_disabled_at, zeros
consecutive_failures, and clears last_error. Admin-pause → re-enable
hits the same path harmlessly.

Three observable states now distinguishable in the UI:
- Active              enabled=True,  auto_disabled_at=NULL
- Admin-paused        enabled=False, auto_disabled_at=NULL
- Tripped             enabled=False, auto_disabled_at=<ts>

UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and
a "N TRIPPED" count in the page header. Hover tooltip tells the
operator how to reset ("Re-enable via Edit").

record_webhook_failure now returns the new consecutive_failures count
so the worker can compare against the threshold without a second
roundtrip. trip_webhook_circuit is idempotent — re-tripping just
re-stamps auto_disabled_at.

Closes THREAT_MODEL WH-02 and DEBT-037 §1.
This commit is contained in:
2026-04-24 16:24:33 -04:00
parent ee682eef65
commit 2bcef50ac5
10 changed files with 213 additions and 17 deletions

View File

@@ -210,6 +210,58 @@ async def test_https_url_has_no_warning(
assert res.json()["warnings"] == []
@pytest.mark.asyncio
async def test_reenabling_clears_circuit_trip(
client: httpx.AsyncClient, auth_token: str
):
"""Re-enabling via PATCH clears auto_disabled_at + consecutive_failures.
Simulates the full circuit-breaker lifecycle: create → tripped (via
direct DB write, since we can't easily force N worker failures in an
API-only test) → re-enable via PATCH → verify state cleared.
"""
from datetime import datetime, timezone
from decnet.web.dependencies import repo
create = await client.post(
PATH,
json={
"name": "wh-trip",
"url": "https://example.com/x",
"topic_patterns": ["system.>"],
},
headers={"Authorization": f"Bearer {auth_token}"},
)
assert create.status_code == 201
uuid = create.json()["uuid"]
# Simulate the circuit tripping — direct repo call.
now = datetime.now(timezone.utc)
await repo.record_webhook_failure(uuid, now, "503 service unavailable")
await repo.record_webhook_failure(uuid, now, "503 service unavailable")
await repo.trip_webhook_circuit(uuid, now)
pre = await client.get(
f"{PATH}{uuid}", headers={"Authorization": f"Bearer {auth_token}"}
)
assert pre.json()["enabled"] is False
assert pre.json()["auto_disabled_at"] is not None
assert pre.json()["consecutive_failures"] >= 1
# Re-enable via PATCH — should clear trip + counter + last_error.
res = await client.patch(
f"{PATH}{uuid}",
json={"enabled": True},
headers={"Authorization": f"Bearer {auth_token}"},
)
assert res.status_code == 200
body = res.json()
assert body["enabled"] is True
assert body["auto_disabled_at"] is None
assert body["consecutive_failures"] == 0
assert body["last_error"] is None
@pytest.mark.asyncio
async def test_viewer_forbidden(client: httpx.AsyncClient, viewer_token: str):
res = await client.get(