Files
DECNET/decnet/correlation/event_kinds.py
anti 351a8939c3 feat(attackers): scanned vs. interacted service bucketing on detail page
Adds a new card on AttackerDetail: SCANNED · N services | INTERACTED
WITH · M services. Distinguishes port-scanners (N high, M=0) from
actual engagement (M>0) at a glance — the analyst's first question
when triaging a new attacker row.

Classifier lives in decnet/correlation/event_kinds.py, a single
source of truth for the event-type vocabulary:

- INTERACTION_EVENT_TYPES — command-family (command/exec/query/...),
  SMTP engagement (mail_from/rcpt_to/message_accepted), file/payload
  activity (file_captured/upload/download_attempt/retr), pub/sub
  (publish/subscribe), recorded TTY sessions.
- NOISE_EVENT_TYPES — DECNET-internal (startup/shutdown/parse_error/
  unknown_*).
- Everything else defaults to scan. Conservative by design: new
  template verbs show up as "scanned" until explicitly promoted.

Bucket logic: a service is "interacted" if ≥1 of its events
classifies as interaction; otherwise "scanned" if ≥1 scan event;
noise-only services drop. Disjoint by construction.

Deliberate no-schema path: compute on-the-fly in the detail endpoint
via SELECT DISTINCT service, event_type FROM logs. Small result set
(tens of pairs per attacker), cost is trivial vs. the existing
behavior/commands queries. Trade-off: one more DB round-trip per
detail view in exchange for zero ALTER TABLE migration pain and
immediate classifier-change feedback loop.

Profiler's _COMMAND_EVENT_TYPES stays as-is (strict subset of
interactions that carry executable text), with a comment pointing at
the new canonical module.

Closes DEVELOPMENT.md "Attacker Intelligence §Service-Level Behavioral
Profiling — Services actively interacted with".
2026-04-24 17:12:20 -04:00

114 lines
3.8 KiB
Python

"""Classify RFC 5424 event_type strings as interaction vs. scan vs. noise.
Used by:
- The attacker detail endpoint to split services into "scanned" and
"interacted with" buckets, distinguishing port scanners from
attackers who actually engaged.
- The profiler worker to filter command-family events when extracting
executed-command history.
Classification is conservative: an unknown event_type defaults to
``scan`` rather than ``interaction``. That way a new service template
emitting a fresh verb shows up as "scanned" on the dashboard — visible
but not over-credited. Adding it to ``INTERACTION_EVENT_TYPES`` is
always a deliberate promotion.
"""
from __future__ import annotations
from typing import Literal
# Events that mean the attacker did something past reconnaissance —
# executed a command, sent mail, uploaded a file, subscribed to a topic.
# A service with ≥1 of these from a given attacker is "interacted with".
INTERACTION_EVENT_TYPES: frozenset[str] = frozenset({
# Shell / command-family — lifted from the profiler's original
# command-extraction frozenset; this module is now the source of
# truth for that vocabulary too.
"command",
"exec",
"query",
"input",
"shell_input",
"execute",
"run",
"sql_query",
"redis_command",
"ldap_search",
# SMTP meaningful engagement — once MAIL FROM / RCPT TO lands the
# attacker is trying to send mail, not just banner-grab.
# message_accepted is the DATA-commit moment.
"mail_from",
"rcpt_to",
"rcpt_denied",
"message_accepted",
# File / payload activity
"file_captured",
"upload",
"download_attempt",
"retr", # FTP retrieve
# Pub/sub operational use (vs. mere connection)
"publish",
"subscribe",
# A recorded TTY session is always an interaction — sessrec only
# writes when there was PTY input.
"session_recorded",
})
# Events that are DECNET-internal or protocol-framework noise rather
# than attacker-caused signal. Dropped from both buckets.
NOISE_EVENT_TYPES: frozenset[str] = frozenset({
"startup",
"shutdown",
"config_error",
"parse_error",
"unknown_packet",
"unknown_opcode",
"unknown_command",
"protocol_error",
})
EventKind = Literal["interaction", "scan", "noise"]
def classify_event(event_type: str) -> EventKind:
"""Return the kind label for a single event_type string."""
if event_type in INTERACTION_EVENT_TYPES:
return "interaction"
if event_type in NOISE_EVENT_TYPES:
return "noise"
return "scan"
def bucket_services(
pairs: list[tuple[str, str]],
) -> dict[str, list[str]]:
"""Group distinct service names into scanned vs. interacted buckets.
*pairs* is an iterable of ``(service, event_type)`` tuples — the
shape the repo returns from a ``SELECT DISTINCT service, event_type``
query. A service is placed in ``interacted`` if any of its events
classifies as interaction; otherwise in ``scanned`` if any event
classifies as scan; noise-only services are dropped.
Return shape: ``{"interacted": [...sorted...], "scanned": [...sorted...]}``.
Buckets are disjoint by construction.
"""
best: dict[str, EventKind] = {}
for service, event_type in pairs:
kind = classify_event(event_type)
current = best.get(service)
# Rank: interaction > scan > noise > unset.
if current == "interaction":
continue
if kind == "interaction":
best[service] = "interaction"
elif kind == "scan" and current != "interaction":
best[service] = "scan"
elif kind == "noise" and current is None:
best[service] = "noise"
interacted = sorted(s for s, k in best.items() if k == "interaction")
scanned = sorted(s for s, k in best.items() if k == "scan")
return {"interacted": interacted, "scanned": scanned}