A startup race between `decnet bus` being ready and the API's lifespan hitting `get_app_bus()` at api.py:135 would set `_tried = True` permanently, poisoning the singleton for the rest of the process: the dashboard shows BUS OFFLINE, topology SSE falls into the bus-is-None snapshot-only branch, mutator publish calls no-op. Only an API restart recovered. Replaces the one-shot veto with a time-gated retry keyed on a `_last_failure_ts` monotonic timestamp plus a 2 s backoff. Publishers on the hot path still pay at most one connect attempt every 2 s when the bus is down, but the singleton auto-recovers within 5 s (one dashboard poll) once the bus comes up. The asyncio lock still serialises concurrent callers so the bus server doesn't get stampeded with parallel connect attempts on startup.
4.6 KiB
4.6 KiB