feat(api): opaque 500 handler + error_id correlation for unhandled exceptions

Registers a generic @app.exception_handler(Exception) that catches anything
uncaught in route handlers / dependencies. Prod response is opaque:
{detail: 'Internal Server Error', error_id: <uuid4 hex>}. Dev mode
(DECNET_DEVELOPER=True) adds exception_type and traceback fields so
failures are debuggable without tailing server logs.

The error_id is logged alongside the full traceback server-side, letting
operators correlate a user's 500 report with the exact exception via
`grep <error_id> /var/log/decnet.log`.

FastAPI's own HTTPException routing and the existing
RequestValidationError / ValidationError / RateLimitExceeded handlers
still take precedence — this handler only fires on genuinely-uncaught
exceptions.

Flips threat model F1/I 'traceback / stack trace leakage' from ? to M
and logs a follow-up checklist entry for 4 detail=str(e) sites in the
fleet deploy router (admin-gated, different threat class, separate
audit).
This commit is contained in:
2026-04-23 14:07:32 -04:00
parent 2f4f81e5de
commit ef4179ea1f
3 changed files with 161 additions and 2 deletions

View File

@@ -1,5 +1,7 @@
import asyncio
import os
import traceback
import uuid
from contextlib import asynccontextmanager
from typing import Any, AsyncGenerator, Optional
@@ -297,3 +299,29 @@ async def pydantic_validation_exception_handler(request: Request, exc: Validatio
"type": "internal_validation_error"
},
)
@app.exception_handler(Exception)
async def unhandled_exception_handler(request: Request, exc: Exception) -> ORJSONResponse:
"""Catch-all for uncaught exceptions in route handlers and dependencies.
Prod: opaque 500 with an ``error_id``; full traceback goes ONLY to server
logs. Dev (``DECNET_DEVELOPER=True``): same response plus ``exception_type``
and ``traceback`` fields so failures are debuggable without tailing logs.
The ``error_id`` lets operators correlate a user's 500 report with the full
traceback in server logs (``grep <error_id> /var/log/decnet.log``).
FastAPI's own ``HTTPException`` routing still takes precedence — this
handler only fires on genuinely-uncaught exceptions.
"""
error_id = uuid.uuid4().hex
log.exception(
"unhandled exception on %s %s [error_id=%s]",
request.method, request.url.path, error_id,
)
body: dict[str, Any] = {"detail": "Internal Server Error", "error_id": error_id}
if DECNET_DEVELOPER:
body["exception_type"] = type(exc).__name__
body["traceback"] = traceback.format_exc()
return ORJSONResponse(status_code=500, content=body)