Extends tracing to every remaining module: all 23 API route handlers,
correlation engine, sniffer (fingerprint/p0f/syslog), prober (jarm/hassh/tcpfp),
profiler behavioral analysis, logging subsystem, engine, and mutator.
Bridges the ingester→SSE trace gap by persisting trace_id/span_id columns on
the logs table and creating OTEL span links in the SSE endpoint. Adds log-trace
correlation via _TraceContextFilter injecting otel_trace_id into Python LogRecords.
Includes development/docs/TRACING.md with full span reference (76 spans),
pipeline propagation architecture, quick start guide, and troubleshooting.
The active prober emits tcpfp_fingerprint events with TTL, window, MSS etc.
from the attacker's SYN-ACK. These were invisible to the behavioral profiler
for two reasons:
1. target_ip (prober's field name for attacker IP) was not in _IP_FIELDS in
collector/worker.py or correlation/parser.py, so the profiler re-parsed
raw_lines and got attacker_ip=None, never attributing prober events to
the attacker profile.
2. sniffer_rollup only handled tcp_syn_fingerprint (passive sniffer) and
ignored tcpfp_fingerprint (active prober). Prober events use different
field names: window_size/window_scale/sack_ok vs window/wscale/has_sack.
Changes:
- Add target_ip to _IP_FIELDS in collector and parser
- Add _PROBER_TCPFP_EVENT and _INITIAL_TTL table to behavioral.py
- sniffer_rollup now processes tcpfp_fingerprint: maps field names, derives
OS from TTL via _os_from_ttl, computes hop_distance = initial_ttl - observed
- Expand prober DEFAULT_TCPFP_PORTS to [22,80,443,8080,8443,445,3389] for
better SYN-ACK coverage on attacker machines
- Add 4 tests covering prober OS detection, hop distance, and field mapping
Templates for http, https, k8s, and docker_api log the client IP as
remote_addr (Flask's request.remote_addr) instead of src_ip. The collector
and correlation parser only checked src_ip/src/client_ip/remote_ip/ip, so
every request event from those services was stored with attacker_ip="Unknown"
and never associated with any attacker profile.
Adding remote_addr to _IP_FIELDS in both collector/worker.py and
correlation/parser.py fixes attribution. The profiler cursor was also reset
to 0 so the worker performs a cold rebuild and re-ingests existing events with
the corrected field mapping.
Reverts commits 8c249f6, a6c7cfd, 7ff5703. The SSH log relay approach
requires container redeployment and doesn't retroactively fix existing
attacker profiles. Rolling back to reassess the approach.
New log_relay.py replaces raw 'cat' on the rsyslog pipe. Intercepts
sshd and bash lines and re-emits them as structured RFC 5424 events:
login_success, session_opened, disconnect, connection_closed, command.
Parsers updated to accept non-nil PROCID (sshd uses PID).
The SSH honeypot logs commands via PROMPT_COMMAND logger as:
<14>1 ... bash - - - CMD uid=0 pwd=/root cmd=ls
These lines had service=bash and event_type=-, so the attacker worker
never recognized them as commands. Both the collector and correlation
parsers now detect the CMD pattern and normalize to service=ssh,
event_type=command, with uid/pwd/command in fields.