Add Unit 11: Logging and Syslog wiki page

2026-04-18 06:05:05 -04:00
parent d6c5351747
commit ff8dad3895

152
Logging-and-Syslog.md Normal file

@@ -0,0 +1,152 @@
# Logging and Syslog
DECNET speaks RFC 5424 everywhere. Every control-plane log line, every decky
honeypot event, and every forwarded message uses the same wire format so a
single parser (Logstash, rsyslog, the bundled ingester) can consume it end to
end.
## RFC 5424 formatter
The control-plane formatter lives in `decnet/config.py`
(`Rfc5424Formatter`). Its output is:
```
<PRIVAL>1 TIMESTAMP HOSTNAME APP-NAME PROCID MSGID STRUCTURED-DATA MSG
```
Field rules:
- `PRIVAL` = `facility * 8 + severity`. Facility is fixed at **local0**
(16), so every line begins with `<13x>` where `x` depends on severity.
- Version is always `1`.
- Timestamp is the record's `created` time, rendered as an ISO-8601 UTC
string with **microsecond** precision
(`.isoformat(timespec="microseconds")`).
- `HOSTNAME` is captured once at import time via `socket.gethostname()`.
- `APP-NAME` defaults to `decnet`, overridable per record via the
`decnet_component` attribute (set by the `ComponentAdapter` in
`decnet/logging/__init__.py`).
- `PROCID` is the live `os.getpid()`.
- `MSGID` is the Python logger name (e.g. `decnet.config`).
- `STRUCTURED-DATA` is the NILVALUE `-` for control-plane logs.
- `MSG` is the formatted record, with exception tracebacks appended on a
newline when present.
### Severity map
| Python level | RFC 5424 severity |
|--------------|-------------------|
| CRITICAL | 2 (Critical) |
| ERROR | 3 (Error) |
| WARNING | 4 (Warning) |
| INFO | 6 (Informational) |
| DEBUG | 7 (Debug) |
### Example line
```
<134>1 2026-04-12T21:48:03.123456+00:00 host decnet 1234 decnet.config - Dev mode active
```
`134` decodes as facility 16 (local0) × 8 + severity 6 (INFO).
## Handlers installed by `_configure_logging`
`decnet/config.py::_configure_logging(dev)` runs at import time, gated on
`DECNET_DEVELOPER`. It is idempotent — if an RFC 5424 `StreamHandler` is
already on the root logger, it returns.
Installed handlers:
1. A stderr `StreamHandler` with `Rfc5424Formatter`. Root level is
`DEBUG` when `dev=True`, otherwise `INFO`.
2. An `InodeAwareRotatingFileHandler`
(`decnet/logging/inode_aware_handler.py`) pointed at
`DECNET_SYSTEM_LOGS` (default `decnet.system.log` in `$PWD`),
`maxBytes=10 MB`, `backupCount=5`, `encoding="utf-8"`. Skipped when any
`PYTEST*` environment variable is set.
`InodeAwareRotatingFileHandler` extends the stdlib `RotatingFileHandler`
with a cheap `os.stat` on every emit: if the file's `(st_ino, st_dev)`
differ from the held fd, the handler closes and reopens. This survives
`logrotate` (without copytruncate), `rm`, and sudo-induced ownership
flips without losing lines, and it falls back to `handleError` rather
than crashing if it cannot reopen.
## Root-chown under sudo
When deploy runs as root (required for MACVLAN/IPVLAN), the log file is
created root-owned. `decnet/privdrop.py::chown_to_invoking_user` is
called right after the file handler is wired up in `_configure_logging`
— it honours `SUDO_UID` / `SUDO_GID` and hands the file back to the
invoking user so a subsequent non-root `decnet api` or `decnet status`
can still append. The honeypot log file handler
(`decnet/logging/file_handler.py`) additionally calls
`chown_tree_to_invoking_user` on the log directory.
## `--log-target HOST:PORT` forwarding
Service plugins emit directly to the aggregator; DECNET itself stays
agnostic about what listens. `decnet/logging/forwarder.py` exposes:
- `parse_log_target("ip:port") -> (host, port)` — rejects anything that
doesn't split on a trailing `:port` with a digit-only port.
- `probe_log_target(log_target, timeout=2.0)` — a non-fatal TCP connect
used at deploy time to warn the operator if the target is unreachable.
The CLI accepts the value and injects it into every service plugin's
`compose_fragment(...)` as the `log_target` kwarg. Plugins (see
`decnet/services/base.py` and concrete services like `ssh.py`,
`smtp.py`, `sniffer.py`, `mysql.py`, `redis.py`,
`elasticsearch.py`, `https.py`) add it to the container's environment as
`LOG_TARGET=ip:port`. The in-container emit helpers (for example
`templates/ssh/emit_capture.py`) read `LOG_TARGET` and write a
structured-data RFC 5424 line per event. The per-service formatter lives
in `decnet/logging/syslog_formatter.py::format_rfc5424`, which uses
facility `local0`, PEN `relay@55555` for structured data, and escapes
SD-PARAM-VALUE per RFC 5424 §6.3.3.
Deckies typically emit to two destinations simultaneously: a local
rotating file (`decnet/logging/file_handler.py`, default
`/var/log/decnet/decnet.log`, same 10 MB × 5 rotation, controlled by
`DECNET_LOG_FILE`) and the remote `LOG_TARGET` when set.
## Ingestion pipeline
`decnet/web/ingester.py::log_ingestion_worker` is a FastAPI background
task that tails the JSON sidecar of the honeypot log
(`<DECNET_INGEST_LOG_FILE>.json`) and bulk-inserts rows into the
repository.
Batching is governed by two env vars (see
[Environment variables](Environment-Variables)):
- `DECNET_BATCH_SIZE` (default `100`) — flush when the accumulated
batch hits this row count.
- `DECNET_BATCH_MAX_WAIT_MS` (default `250`) — flush when this many
milliseconds have passed since the batch started, even if smaller.
The worker persists its byte-offset in the repository under
`ingest_worker_position`, so restarts resume where they left off.
Partial trailing lines are deferred to the next iteration, and
truncation is detected (`st_size < position`) and resets the offset to
0. Each record gets an OpenTelemetry child span chained off the
collector's via `extract_context`, so the full journey from packet
capture to DB insert is visible in Jaeger — see
[Tracing and profiling](Tracing-and-Profiling).
`_flush_batch` commits the rows via `repo.add_logs(...)`, runs
`_extract_bounty` on each entry (credentials, JA3/JA3S/JA4, JARM,
HASSHServer, TCP/IP fingerprints, TLS certificates, VNC/SSH banners,
HTTP User-Agents), and finally updates the saved position. If the task
is being cancelled during lifespan teardown, it bails out before
touching the DB so the un-committed tail is re-read next startup
instead of being lost.
## From DB to dashboard
Rows land in the repository's `logs` table and are served by the
`/api/logs` endpoints. The live-logs page streams them over
Server-Sent Events and the dashboard renders aggregates
(per-service counts, attackers, bounties). See the
[Web dashboard](Web-Dashboard) for the UI side.