Commit Graph

4 Commits

Author SHA1 Message Date
f6e67c036d fix(types): T5 — narrow AsyncClient|None with inline if; rename loop variable t→task to avoid no-redef 2026-05-01 02:18:57 -04:00
efc98285aa fix(webhook/worker): self-heal when bus starts late or restarts
Before: if the bus was unreachable at worker start, we logged
"running in idle mode" once and parked on shutdown forever. systemd
doesn't guarantee bus is fully up before the webhook worker starts,
so a race on boot left the worker permanently dead until restart.

Now: wrap the whole bus-use in an outer reconnect loop.

  while not shutdown:
    try: connect()
    except: sleep(RECONNECT_SECS) ; continue
    try: run_with_bus(...)       # heartbeat + dispatch
    except: log+close ; reconnect on next iter

Clean consequence: if the bus dies mid-operation the dispatch loop's
subscriptions raise inside the consumer tasks, `_run_with_bus` exits,
the outer loop closes the stale connection and reconnects. No partial
state leaks across epochs — fresh bus, fresh subs, fresh heartbeat.

Interval is 60s by default, overridable via
DECNET_WEBHOOK_BUS_RECONNECT_SECS. Shutdown wakes the wait so
systemctl stop doesn't hang for a minute.

Test added: flaky get_bus that fails once, then returns a live
FakeBus — asserts retry + successful delivery.

get_app_bus() in decnet/bus/app.py already has a 2s backoff retry so
the FastAPI hot path self-heals; this commit brings the standalone
webhook worker in line with the same posture.
2026-04-24 16:39:38 -04:00
2bcef50ac5 feat(webhooks): circuit breaker auto-disables misbehaving subscriptions
After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed
deliveries, the worker calls trip_webhook_circuit(uuid, ts) which
flips enabled=False and stamps auto_disabled_at. The worker sets its
reload flag so the next dispatch epoch stops consuming events for the
tripped sub entirely — one dead receiver can't poison the shared
egress pool anymore.

Operator clears the trip via PATCH — setting enabled=True when the
sub was previously disabled clears auto_disabled_at, zeros
consecutive_failures, and clears last_error. Admin-pause → re-enable
hits the same path harmlessly.

Three observable states now distinguishable in the UI:
- Active              enabled=True,  auto_disabled_at=NULL
- Admin-paused        enabled=False, auto_disabled_at=NULL
- Tripped             enabled=False, auto_disabled_at=<ts>

UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and
a "N TRIPPED" count in the page header. Hover tooltip tells the
operator how to reset ("Re-enable via Edit").

record_webhook_failure now returns the new consecutive_failures count
so the worker can compare against the threshold without a second
roundtrip. trip_webhook_circuit is idempotent — re-tripping just
re-stamps auto_disabled_at.

Closes THREAT_MODEL WH-02 and DEBT-037 §1.
2026-04-24 16:24:33 -04:00
e6127a81a1 feat(webhook): worker + CLI + systemd unit
Introduces the `decnet webhook` long-running worker that consumes the
internal bus and POSTs matching events to configured subscriptions.

Design: one task per (subscription, pattern) pair. Each task opens
its own bus subscription, iterates events, and dispatches via the
shared deliver() client. No intermediate queue, no in-memory filter
matching — the bus's own pattern matcher is the filter. Reloads on
`system.webhook.subscriptions_changed` signals from the CRUD router,
with a 60s fallback timer in case a signal is lost.

Shutdown propagates via CancelledError on the outer task; all inner
subscription tasks are cancelled and awaited in a finally block.
Bus unavailable → worker stays up in idle mode per the DEBT-031
pattern, logging one warning.

Registered as a master-only CLI command (agents don't configure
webhooks — the subscription store lives on master). systemd unit
mirrors the profiler template; added to decnet.target Wants= list so
`systemctl start decnet.target` brings it up alongside everything
else. `decnet init` auto-picks up the new .service.j2 via its
existing `glob("decnet-*.service.j2")` sweep.
2026-04-24 15:46:11 -04:00