Files
DECNET/decnet/engine
anti d595240f55 fix(engine): post-deploy verify topology containers, mark DEGRADED on boot crash
deploy_topology was flipping to ACTIVE the moment 'compose up -d'
returned 0, but compose returns 0 as soon as containers are *started*.
A service that crashes on boot (port bind failure, bad image, missing
entrypoint) left the topology row sitting at ACTIVE indefinitely while
half the substrate was dead.

After compose returns, we now run 'compose ps --all --format json',
parse the newline-delimited per-container rows, and downgrade to
DEGRADED with a reason listing the first eight unhealthy containers if
anything isn't in state='running'.  Operators see real state on the
topology page instead of an optimistic flag.

_compose_ps swallows compose-level errors (returns []) so an unrelated
docker hiccup doesn't gate the success path — the existing in-flight
exception path still catches genuine deploy failures with FAILED.
2026-04-28 23:39:50 -04:00
..