feat(webhooks): circuit breaker auto-disables misbehaving subscriptions
After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed
deliveries, the worker calls trip_webhook_circuit(uuid, ts) which
flips enabled=False and stamps auto_disabled_at. The worker sets its
reload flag so the next dispatch epoch stops consuming events for the
tripped sub entirely — one dead receiver can't poison the shared
egress pool anymore.
Operator clears the trip via PATCH — setting enabled=True when the
sub was previously disabled clears auto_disabled_at, zeros
consecutive_failures, and clears last_error. Admin-pause → re-enable
hits the same path harmlessly.
Three observable states now distinguishable in the UI:
- Active enabled=True, auto_disabled_at=NULL
- Admin-paused enabled=False, auto_disabled_at=NULL
- Tripped enabled=False, auto_disabled_at=<ts>
UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and
a "N TRIPPED" count in the page header. Hover tooltip tells the
operator how to reset ("Re-enable via Edit").
record_webhook_failure now returns the new consecutive_failures count
so the worker can compare against the threshold without a second
roundtrip. trip_webhook_circuit is idempotent — re-tripping just
re-stamps auto_disabled_at.
Closes THREAT_MODEL WH-02 and DEBT-037 §1.
This commit is contained in:
@@ -41,6 +41,13 @@ class WebhookSubscription(SQLModel, table=True):
|
||||
last_success_at: Optional[datetime] = None
|
||||
last_failure_at: Optional[datetime] = None
|
||||
last_error: Optional[str] = None
|
||||
# Set when the circuit breaker auto-disables the subscription after
|
||||
# too many consecutive failures. NULL means "not tripped" — the
|
||||
# subscription is either active (enabled=True) or admin-paused
|
||||
# (enabled=False, auto_disabled_at=NULL). A non-NULL stamp with
|
||||
# enabled=False means the worker tripped it; the operator clears
|
||||
# the flag by re-enabling via PATCH.
|
||||
auto_disabled_at: Optional[datetime] = None
|
||||
created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
|
||||
updated_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
|
||||
|
||||
@@ -100,6 +107,7 @@ class WebhookResponse(BaseModel):
|
||||
last_success_at: Optional[datetime] = None
|
||||
last_failure_at: Optional[datetime] = None
|
||||
last_error: Optional[str] = None
|
||||
auto_disabled_at: Optional[datetime] = None
|
||||
created_at: datetime
|
||||
updated_at: datetime
|
||||
warnings: List[str] = PydanticField(default_factory=list)
|
||||
|
||||
Reference in New Issue
Block a user