feat(webhooks): circuit breaker auto-disables misbehaving subscriptions
After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed
deliveries, the worker calls trip_webhook_circuit(uuid, ts) which
flips enabled=False and stamps auto_disabled_at. The worker sets its
reload flag so the next dispatch epoch stops consuming events for the
tripped sub entirely — one dead receiver can't poison the shared
egress pool anymore.
Operator clears the trip via PATCH — setting enabled=True when the
sub was previously disabled clears auto_disabled_at, zeros
consecutive_failures, and clears last_error. Admin-pause → re-enable
hits the same path harmlessly.
Three observable states now distinguishable in the UI:
- Active enabled=True, auto_disabled_at=NULL
- Admin-paused enabled=False, auto_disabled_at=NULL
- Tripped enabled=False, auto_disabled_at=<ts>
UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and
a "N TRIPPED" count in the page header. Hover tooltip tells the
operator how to reset ("Re-enable via Edit").
record_webhook_failure now returns the new consecutive_failures count
so the worker can compare against the threshold without a second
roundtrip. trip_webhook_circuit is idempotent — re-tripping just
re-stamps auto_disabled_at.
Closes THREAT_MODEL WH-02 and DEBT-037 §1.
This commit is contained in:
@@ -1829,26 +1829,41 @@ class SQLModelRepository(BaseRepository):
|
||||
|
||||
async def record_webhook_failure(
|
||||
self, uuid: str, ts: datetime, error: str
|
||||
) -> None:
|
||||
) -> int:
|
||||
async with self._session() as session:
|
||||
# Read current failure count, bump, write. Small race window on
|
||||
# concurrent deliveries to the same subscription is acceptable —
|
||||
# the counter informs the circuit-breaker heuristic (DEBT-037),
|
||||
# not a correctness invariant.
|
||||
# the counter informs the circuit-breaker heuristic, not a
|
||||
# correctness invariant.
|
||||
result = await session.execute(
|
||||
select(WebhookSubscription.consecutive_failures).where(
|
||||
WebhookSubscription.uuid == uuid
|
||||
)
|
||||
)
|
||||
current = result.scalar_one_or_none() or 0
|
||||
new_count = current + 1
|
||||
await session.execute(
|
||||
update(WebhookSubscription)
|
||||
.where(WebhookSubscription.uuid == uuid)
|
||||
.values(
|
||||
consecutive_failures=current + 1,
|
||||
consecutive_failures=new_count,
|
||||
last_failure_at=ts,
|
||||
last_error=error[:512] if error else None,
|
||||
updated_at=ts,
|
||||
)
|
||||
)
|
||||
await session.commit()
|
||||
return new_count
|
||||
|
||||
async def trip_webhook_circuit(self, uuid: str, ts: datetime) -> None:
|
||||
async with self._session() as session:
|
||||
await session.execute(
|
||||
update(WebhookSubscription)
|
||||
.where(WebhookSubscription.uuid == uuid)
|
||||
.values(
|
||||
enabled=False,
|
||||
auto_disabled_at=ts,
|
||||
updated_at=ts,
|
||||
)
|
||||
)
|
||||
await session.commit()
|
||||
|
||||
Reference in New Issue
Block a user