Commit Graph

1136 Commits

Author SHA1 Message Date
e52a0e0381 feat(profiler/behave_shell): emit cognitive.inter_command_latency_class
BEHAVE-EXTRACTOR.md Phase A Step 5. Classifies the operator's
thinking pace between commands. Splits LW-sim / CLAUDE-FF /
CLAUDE-CL.

* _features/cognitive.py:inter_command_latency_class(ctx) emits one
  Observation in {instant, typing_speed, deliberate,
  llm_lightweight, llm_heavyweight, long}, computed as the median
  of ctx.inter_cmd_iats bucketed against the prototype thresholds
  (v0.2 split: lightweight 2-8s, heavyweight 8-30s).
* Sample-size honesty: < 5 commands halves confidence (0.40 vs
  0.80) per BEHAVE-EXTRACTOR.md.
* Threshold consts (INTER_CMD_*_MAX, MIN_COMMANDS_FOR_FULL_CONFIDENCE,
  plus parked Step 6/7/8 thresholds for the next three commits)
  added to _thresholds.py.

Tests cover all six buckets at empirically-anchored IATs (15s ≈
Claude Opus driving recon via tmux send-keys), plus the
single-command no-IAT and low-sample-count paths.
2026-05-03 07:52:39 -04:00
f3880b24d1 feat(profiler/behave_shell): command segmentation in SessionContext
BEHAVE-EXTRACTOR.md Phase A Step 4. Pure refactor inside _ctx.py —
no new feature emits. Lays the shared utility for the three
cognitive primitives next in line (Steps 5-7).

* Command dataclass (frozen): start_ts, end_ts, first_token_hash.
  PII-safe by construction — only the first whitespace-delimited
  token of the command is retained, and only as a sha256 hash
  (decnet/profiler/behave_shell/_parse.py:hash_token).
* _segment_commands walks input events char-by-char, splits on
  \r / \n, hashes the first token, drops the rest.
* SessionContext gains commands, inter_cmd_iats, output_per_cmd.
  output_per_cmd[i] counts bytes between commands[i].end_ts and
  commands[i+1].start_ts — the natural pairing for Step 7
  (feedback_loop_engagement).

Tests: empty / unterminated streams, single command (CR + LF
terminators), paste-with-newline, multi-command IAT pairing,
output-byte counting between boundaries, blank-line skip,
first-token-only PII discipline.
2026-05-03 07:50:55 -04:00
6763fceb0b feat(profiler/behave_shell): emit motor.paste_burst_rate
BEHAVE-EXTRACTOR.md Phase A Step 3. Same paste-event ratio as
motor.input_modality but coarser-bucketed: this is the *habit*
signal (does the operator reach for paste at all?), where
input_modality is the dominant-channel signal.

* _features/motor.py:paste_burst_rate(ctx) emits one Observation
  per session in {none, occasional, habitual} with confidence
  0.70 / 0.70 / 0.80.
* Thresholds: PASTE_RATE_OCCASIONAL_MIN=0.10,
  PASTE_RATE_HABITUAL_MIN=0.50.

Splits YOU-sim from LW/CLAUDE-FF/CLAUDE-CL — LLM-driven sessions
paste habitually, real humans rarely paste.

Tests: pure-typed → none; 1-paste-in-10 → occasional;
paste-majority → habitual; output-only → no observation; habitual
confidence > occasional confidence.
2026-05-03 07:49:03 -04:00
879f5e731b feat(profiler/behave_shell): emit motor.input_modality
BEHAVE-EXTRACTOR.md Phase A Step 2. The first primitive — picked
first because it has the highest discriminative value (HUMAN vs
everyone) and the simplest implementation (paste-event ratio over
total inputs).

* _features/motor.py:input_modality(ctx) emits one Observation
  per session in {typed, pasted, mixed} with confidence 0.75 / 0.70.
* _features/_emit.py centralises the make_observation helper so
  every feature module gets the same Window/source/evidence_ref
  boilerplate without copy-paste.
* Thresholds inherited from the prototype's calibration history
  (MODALITY_PASTED_MIN=0.40, MODALITY_TYPED_MAX=0.05).
* Zero-input session skips emission — registry doesn't admit
  "unknown" here.

Tests: pure-typed → typed, pure-pasted → pasted, mixed → mixed,
output-only session → no observation, full envelope round-trip.
2026-05-03 07:47:38 -04:00
c9a81a23c2 feat(profiler/behave_shell): asciinema parser + paste-burst detection
BEHAVE-EXTRACTOR.md Phase A Step 1. Lays the shared primitives that
Steps 2-3 (motor.input_modality, motor.paste_burst_rate) will
consume:

* parse_shard_line / parse_shard turn a shard JSONL line/file into
  AsciinemaEvents, skipping headers and malformed records.
* PasteBurst dataclass + _detect_paste_bursts group consecutive
  paste-class input events (len(d) >= 4 chars per the prototype's
  empirical floor) into contiguous bursts, splitting on IAT gaps
  larger than PASTE_BURST_MAX_IAT_S (200ms).
* SessionContext now carries iats and paste_bursts derivations.
* Threshold constants harvested from
  BEHAVE/prototype_extractors/shell/extract.py — calibrated against
  the five 2026-05-02 shards.

Tests cover pure-typed, pure-pasted, mixed streams; close vs far
paste events; typed events breaking a burst; PasteBurst immutability;
and the JSON parser's junk handling.
2026-05-03 07:46:01 -04:00
f8eae04e5d feat(profiler/behave_shell): scaffold extract_session entry point
BEHAVE-EXTRACTOR.md Phase A Step 0. Lays the package skeleton
(__init__/extract/_parse/_ctx/_thresholds/_features) with empty
FEATURES = (), so the worker plumbing in BEHAVE-INTEGRATION Phase 4
has a stable import path before any primitive lands.

extract_session() builds a SessionContext once and fans the
registered feature functions across it; at Step 0 that fan-out is
empty and the function yields nothing. Step 1 (asciinema parser +
paste-burst detector) and Step 2 (motor.input_modality) land next.

Smoke suite asserts the empty contract: empty stream → no
observations, single event → t_start == t_end, multi-event → events
routed into input_events / output_events by kind, evidence_ref
defaults to "session:<sid>" or honours an explicit override.
2026-05-03 07:42:09 -04:00
a2a61b636e feat(web): drop SessionProfile, wire observations into AttackerDetail (DEBT-050 / DEBT-036 closure)
Destructive half of BEHAVE-INTEGRATION.md Phase 1. SessionProfile +
its kd_* columns + the dialect ALTER TABLE migration helpers are
deleted outright; pre-v1, the table shipped empty, no migration
ceremony required (per the no-new-_migrate_-pre-v1 memory rule).
DEBT-036 closes via DEBT-050 supersedure. AttackerDetail's
``observations`` field is wired to the new ``observations`` table
and returns an empty list until the BEHAVE-SHELL extractor (DEBT-050
Phase 2) starts emitting.

decnet/web/db/models/attackers.py — SessionProfile class deleted
(~135 lines), KD_PAUSE_*/KD_START_OF_ACTION_IDLE_S module constants
deleted, module docstring updated to point at the observations
table. AttackerIdentity.kd_digraph_simhash is KEPT — it's the v2
federation centroid hook, not a SessionProfile field; docstring
repointed to the BEHAVE primitive that will populate it.

decnet/web/db/sqlmodel_repo/attackers/sessions.py — DELETED.
SessionProfilesMixin dropped from the AttackersMixin MRO.

decnet/web/db/repository.py — abstract upsert_session_profile +
get_session_profile removed.

decnet/web/db/sqlite/repository.py + mysql/repository.py —
_migrate_session_profile_table helpers and their initialize() calls
removed. mysql initialize() now goes attackers → column_types →
admin (no session_profile step).

decnet/web/db/models/__init__.py — SessionProfile re-export gone.

decnet/web/db/models/attacker_intel.py — docstring cross-reference
to SessionProfile.schema_version retargeted to AttackerIdentity.

decnet/web/router/attackers/api_get_attacker_detail.py — adds
``observations: []`` to the response by calling
``repo.latest_observation_per_primitive(uuid)`` and projecting to a
list sorted by primitive path. Empty until the extractor lands;
shape matches BEHAVE-INTEGRATION.md §"AttackerDetail consumer".

tests/profiler/test_session_profile.py — DELETED (56 lines).
tests/db/test_base_repo.py — DummyRepo loses upsert_session_profile
and get_session_profile overrides.
tests/db/mysql/test_mysql_migration.py — initialize-call-order
assertion updated; session_profile step removed from the expected
sequence; docstring records why.
tests/ttp/test_lifter_absence.py — docstring "no SessionProfile" →
"no ObservationRow".
2026-05-03 07:33:37 -04:00
0972325527 feat(web/db): observations table + repo + bus prefix (BEHAVE-INTEGRATION Phase 1)
Additive Phase 1 of BEHAVE-INTEGRATION.md. Lays the storage layer
the BEHAVE-SHELL extractor (DEBT-050) will write into. Nothing
breaks; SessionProfile coexists for now and is dropped in the
follow-up commit.

decnet/web/db/models/observations.py — new ObservationRow SQLModel
mirroring the BEHAVE Observation envelope field-for-field
(core/decnet_behave_core/spec/envelope.py). ``id`` is a hex-string
UUID (matching BEHAVE), not a typed UUID column. ``identity_ref``
is str | None — written by the future attribution engine, NULL
until then. ``attacker_uuid`` is the one DECNET-side
denormalisation; FK'd to attackers.uuid for cheap AttackerDetail
joins. ``evidence_ref`` is NOT NULL for DECNET emissions even
though the upstream envelope makes it optional — the worker's
"already profiled?" check keys on it. UniqueConstraint(evidence_ref,
primitive) enforces idempotency at the schema level so re-running
the extractor on the same shard+sid produces a DB-side conflict
the upsert path resolves deterministically. Class is named
``ObservationRow`` (not ``Observation``) to avoid colliding with
the BEHAVE Pydantic envelope at sites that import both.

decnet/web/db/sqlmodel_repo/observations.py — ObservationsMixin.
Three public methods backing the canonical queries from
BEHAVE-INTEGRATION.md §"Storage": ``upsert_observation`` (idempotent
on the natural key), ``latest_observation_per_primitive`` (per-
primitive MAX(ts) subquery, portable across SQLite and MySQL — no
DISTINCT ON), ``observations_time_series`` (asc-by-ts). Plus
``has_observations_for_evidence`` for the worker's session-already-
profiled check.

decnet/bus/topics.py — ATTACKER_OBSERVATION_PREFIX = "observation"
constant + ``attacker_observation(primitive)`` builder. Full topic
shape ``attacker.observation.<primitive>`` matches what BEHAVE's
spec.event_adapter.event_topic_for produces upstream. Documentation
+ pattern matching only — bus auth is socket file perms (DEBT-029
§2), not topic-level.

decnet/web/db/repository.py — abstract ``upsert_observation``,
``latest_observation_per_primitive``, ``observations_time_series``
on BaseRepository.

tests/db/test_observations.py — 11 tests covering upsert round-trip,
idempotency under the unique constraint, latest-per-primitive
ordering across multiple sessions, time-series asc-ordering, empty-
attacker contract, every BEHAVE ValueKind round-tripping through
the JSON column, and the has_observations_for_evidence check.

tests/db/test_base_repo.py — DummyRepo gains the three new abstract
overrides so its coverage suite still instantiates.
2026-05-03 07:25:10 -04:00
11f474556c docs(behave): integration + extractor + attribution design (DEBT-050 / 051)
Three sibling design docs plus DEBT.md updates that supersede the
stale DEBT-036 with a BEHAVE-aligned plan.

development/BEHAVE-INTEGRATION.md — five-phase rollout: storage
(observations table mirroring the BEHAVE Observation envelope plus
one DECNET-side denorm; UniqueConstraint(evidence_ref, primitive)
enforcing idempotency); engine (in decnet/profiler/behave_shell/
sublibrary, no new daemon, not in BEHAVE — DECNET is the engine);
BEHAVE pin; worker wire; UI panel + per-attacker SSE route; live
smoke. Bus payload merges id/ts/v back in to preserve sensor
identifiers across the bus envelope.

development/BEHAVE-EXTRACTOR.md — engine route in eight phases
(A–H). Phase A locks the 6-primitive calibration grid; Phases B–G
expand horizontally; Phase H is the full Tier-A corpus + v0
release. v0 ships every shell-extractable primitive (37 of them);
Tier B is cross-session and lives in the attribution engine; Tier
C is network-domain (toolchain.*) and lives elsewhere.

development/ATTRIBUTION-ENGINE.md — sublibrary inside
decnet/correlation/ that consumes attacker.observation.* events
and emits attribution.profile.* derived state. Five-state machine
(unknown / stable / drifting / conflicted / multi_actor) with per-
ValueKind merge functions. v0 closes DEBT-051; v1 adds the real
clusterer; v2 federation gossip. The bright line forbidding
attribution to natural persons is lifted directly from BEHAVE's
envelope docstring.

development/DEBT.md — DEBT-036 marked STALE; DEBT-050 and
DEBT-051 entries added; summary table + open list updated.
2026-05-03 07:24:19 -04:00
3f080f601d feat(intel,ingester): mal_hash feed + observed_attachments table (DEBT-046)
New MalHashProvider sibling ABC (decnet/intel/base.py) since SHA-256
is a different keyspace from IntelProvider's IPs. MalwareBazaarProvider
mirrors FeodoProvider's bulk-feed shape: 24h refresh via _ensure_fresh
/ _refresh, in-memory set[str] of hex-lowercased hashes, set-membership
lookup. Auth-keyed via DECNET_MALWAREBAZAAR_AUTH_KEY; absent key
silent-no-ops the lane (single warning, no HTTP traffic).

Per-hash observations persist to a new observed_attachments table.
DECNET is a honeypot platform — every attachment hash an attacker
delivers is intel, regardless of whether anyone classified it. Verdict
is sticky: True never downgrades to False/None on subsequent
observations. Out of scope: API surface, federation export, retention.

Ingester _publish_email_received calls the provider for each attachment
sha256, sets mal_hash_match on the bus payload (omitted entirely when
the message had no attachments — keeps R0046's `is True` predicate
silent on hash-less mail, matching pre-paydown behavior), and upserts
the row regardless of provider availability.
2026-05-03 05:56:46 -04:00
03beff3840 feat(orchestrator): authoritative failure-count badge endpoint (DEBT-042)
New GET /api/v1/orchestrator/events/stats?since=1h&success=false&kind=...
backed by repo.count_orchestrator_failures(since_ts, kind), which
counts failed rows across both orchestrator_events and
orchestrator_emails since the cutoff.

Window parser accepts ^\d+[smhd]$, capped at 7d. Today only
success=false is accepted on this surface so the endpoint isn't
accidentally repurposed before the next consumer is properly
designed.

Orchestrator.tsx polls the endpoint on mount + every 30 s and
renders the authoritative DB-derived count instead of deriving from
the in-memory SSE buffer + one paginated page (which silently
excluded failures older than the local window).
2026-05-03 05:26:45 -04:00
866a76eccf test(web): scaffold vitest + RTL with Orchestrator seed suite (DEBT-043)
Wire vitest 4 + jsdom + @testing-library/{react,jest-dom,user-event}
+ @vitest/coverage-v8 through vite.config.ts (defineConfig from
vitest/config). src/test/setup.ts registers jest-dom matchers and
RTL cleanup. tsconfig.app.json picks up vitest/globals types.

Seed suite Orchestrator.test.tsx covers the three regressions
called out in DEBT-043: empty-state render, kind-filter toggling
triggers a scoped refetch, mocked stream callback prepends a row.
2026-05-03 05:20:01 -04:00
6c6f97e840 feat(prober,correlation): attacker fingerprint rotation detection (DEBT-032)
When the prober observes a NEW hash for an
(attacker_uuid, port, probe_type) triple it has seen before — VPS
rotation, SSH server rebuild, TLS cert swap — emit a derived
attacker.fingerprint_rotated event carrying both old and new hash.
Detection is a small library (decnet.correlation.fingerprint_rotation)
called inline from the prober at each of the three emit sites
(JARM/HASSH/TCPFP). No new daemon. New AttackerFingerprintState table
holds per-triple last-hash state; Attacker.rotation_count and
Attacker.last_rotation_at are stamped on every diff. Library is sync,
fully unit-tested via injected publish_fn / syslog_fn callbacks.
2026-05-03 05:12:51 -04:00
dcd558fd91 chore(infra): pin Docker base images by digest (DEBT-023)
All base images (debian:bookworm-slim, ubuntu:22.04, ubuntu:20.04,
rockylinux:9-minimal, centos:7, alpine:3.19, fedora:39,
kalilinux/kali-rolling, archlinux:latest, honeynet/conpot:latest)
now carry their resolved sha256 digest so 'docker pull' is
deterministic. :tag retained for human readability; @sha256 is what
Docker actually resolves. Refresh procedure documented at the top of
decnet/distros.py.
2026-05-03 04:38:39 -04:00
6e19d3a25a chore(bait): scaffold default seed dir with README
Empty directory tracked via .gitkeep so operators see it on first
clone; README documents the .eml/.json drop-in flow that the IMAP/POP3
compose fragments wire up by default.
2026-05-03 04:30:09 -04:00
b3a96a045f feat(mail): default email_seed → \$PROJROOT/bait/ when unset
When service_cfg["email_seed"] is absent, compose_fragment now falls
back to $PROJROOT/bait/ if that directory exists on the host. Lets
operators drop a deployment-wide bait corpus into one place without
threading email_seed through every decky's config. Missing dir keeps
old no-op behavior.
2026-05-03 04:25:24 -04:00
b88d67794d feat(mail): operator-tunable IMAP/POP3 email seed (DEBT-026)
IMAP_EMAIL_SEED / POP3_EMAIL_SEED accept a directory (rglob *.eml +
*.json) or a single .json/.eml. Loaded entries CONCATENATE with the
hardcoded _BAIT_EMAILS — additive to the realism-engine emailgen
output rather than replacing it. JSON dicts require from_addr /
to_addr / subject / body; bare bodies are wrapped into RFC 5322 on
load. compose_fragment reads service_cfg["email_seed"] and bind-mounts
the host path read-only at /var/spool/decnet-emails/seed.
2026-05-03 02:47:06 -04:00
e0b07651fd docs(debt): mark DEBT-047 resolved (EmailLifter disk-reach + ttp agent gate) 2026-05-02 20:07:54 -04:00
79674026dd feat(cli): allow decnet ttp on agents (DEBT-047)
The TTP-tagging worker is now safe to run on agent hosts: EmailLifter
disk-reaches body-aware predicates from the local artifacts tree
(DEBT-035 unblocked filesystem access; DEBT-047 added the helper).

Drop `ttp` from MASTER_ONLY_COMMANDS in cli/gating.py and remove the
defence-in-depth `_require_master_mode("ttp")` call in cli/ttp.py.
`ttp-backfill` walks the master DB and stays master-only.
2026-05-02 20:07:03 -04:00
e972d870de feat(ttp): EmailLifter disk-reach for body-aware predicates (DEBT-047)
R0047 (BEC) and the encoded-payload predicate substring-match against
the email body. Shipping raw body text on the abstracted service bus
is the wrong privacy stance — the bus transport may swap from UNIX
socket to networked at any time, and "loopback today" is not a license
to put PII on the wire.

EmailLifter now opens the .eml lazily from
/var/lib/decnet/artifacts/{decky_id}/smtp/{stored_as} when a body-aware
predicate runs and parses the body in-process via stdlib email +
policy.default. The decoded body is memoized into the payload dict so
multiple body-aware predicates on the same event open the file once.

Bus envelope only carries the artifact pointer (decky_id + stored_as);
raw body bytes never cross the host disk boundary on the agent → master
hop. Filesystem access on agents is unblocked by DEBT-035 (setgid +
group-readable artifacts root, paid 2026-05-02).

The legacy inline body_text path is preserved — when the producer ships
body_text on the bus the helper short-circuits without opening the file.
2026-05-02 20:05:54 -04:00
7036a86e76 refactor(artifacts): extract resolve_artifact_path to shared module
Move artifact path validation + symlink-escape check out of the
admin-gated download endpoint into decnet/artifacts/paths.py so the
TTP EmailLifter can disk-reach .eml files at tag-time without
duplicating regex/root logic (DEBT-047).

The router now catches ArtifactPathError and re-raises HTTPException(400);
behavior is unchanged.
2026-05-02 20:02:47 -04:00
cdbb3d3571 fix(ssh,telnet): move PROMPT_COMMAND out of /root/.bashrc + pin readonly
ANTI flagged two regressions in the existing command-event capture:

1. **Tell**: PROMPT_COMMAND lived in /root/.bashrc, the FIRST file
   an attacker greps after landing root. The logger invocation
   sitting there is plain-text honeypot signage.
2. **Bypass**: even when missed, `export PROMPT_COMMAND=""` silently
   disables capture. ANTI personally bypasses this on engagements.

Reshape:

* Move the assignment to **/etc/environment** — read by pam_env at
  session open (sshd via /etc/pam.d/sshd, telnet via
  /etc/pam.d/login), before any shell rc file fires. Far less
  obvious than .bashrc; a casual `cat .bashrc` no longer surfaces
  the capture.
* Define the helper as a function `__bash_history_sync` in
  **/etc/bash.bashrc** (system-wide bashrc, sourced by every
  interactive bash). Function name reads as generic bash
  housekeeping; no DECNET branding in the symbol.
* Pin both the function and PROMPT_COMMAND **readonly** so
  `export PROMPT_COMMAND=""` fails with "readonly variable"
  instead of silently winning. Mitigation, not airtight —
  `bash --norc` still bypasses — but the passive `export`
  bypass is closed.

The actual `logger --rfc5424 --msgid command ... CMD ...` invocation
is preserved exactly; only its location and the readonly guard
change. R0001–R0030 (command-rule pack) consume the same syslog
shape as before.

Three new tests assert: the value lands in /etc/environment, the
function body lives in /etc/bash.bashrc, no PROMPT_COMMAND line
remains in /root/.bashrc, and `readonly PROMPT_COMMAND` /
`readonly -f __bash_history_sync` are both present. Mirror
assertions added on the Telnet Dockerfile via
test_config_schema.py.
2026-05-02 19:50:24 -04:00
3e9c4c29b9 feat(ssh,telnet): add non-root user account for privesc + enum lure
Real Linux deployments (especially Ubuntu cloud images) ship a non-
root admin user; honeypots that only accept root logins are a tell.
Add a second account on both SSH and Telnet decoys, configurable
via service_cfg keys `user` / `user_password`, defaulting to
`ubuntu` / `admin` so the lure is live on every fresh deploy.

* `decnet/services/{ssh,telnet}.py` — two new ServiceConfigFields
  (`user` string, `user_password` secret) and matching env vars
  (`SSH_USER` / `SSH_USER_PASSWORD`, mirror for telnet) propagated
  via the compose fragment.
* `decnet/templates/ssh/entrypoint.sh` — runtime `useradd -m -s
  /usr/libexec/login-session -G sudo "$SSH_USER"` so the new user
  inherits the same sessrec pty-recording shell as root and lands
  in the sudo group. Privesc attempts (`sudo`) flow through the
  existing sudo-log capture; network-enum from the user's shell
  rides the recorded transcript.
* `decnet/templates/telnet/entrypoint.sh` — same useradd pattern
  (no sudo group — busybox+login telnet image has no sudo
  package; privesc rides `su -` which itself flows through the
  existing PAM auth-helper at /etc/pam.d/login).
* New tests for default + custom user / password + independence
  from root password. Updated the schema-keys assertion to match
  the four-field shape.

The new account is ALSO the natural home for the body-aware
predicates that were previously gated on root-only sessions —
attackers who land on `ubuntu@host` and run network-recon /
privesc commands now generate the same structured TTP-rule
events as root sessions did, captured via the same auth-helper
+ sessrec + sudo-log pipes.
2026-05-02 19:48:03 -04:00
c675bd26cf docs(debt): mark DEBT-035 resolved; lift DEBT-047 filesystem-access blocker
DEBT-035 (artifacts written as the container uid, not the API's) is
resolved by the two preceding commits:
* 39a298f6 — persists DECNET-service api-user/api-group as names in
  decnet.ini for any future composer / worker that wants to resolve
  the local uid via pwd.getpwnam.
* b2733216 — creates /var/lib/decnet/artifacts at init time with mode
  0o2775 (setgid + group-write) owned by the DECNET-service
  user:group.

The setgid bit is the load-bearing fix: Linux mkdir(2) propagates a
parent's group AND its setgid bit to every new subdirectory. Docker
auto-creates the per-decoy / per-service subtree as bind-mounts fire,
so those subdirs come up with group=decnet and setgid set; container
file writes (default umask 0o022 → mode 0o644) inherit the decnet
group; the API process and the local TTP worker (both running as the
DECNET-service user, primary group decnet) read via group-read.

The original recommendation of compose `user:` injection turned out
infeasible for SSH and Telnet — PAM's setuid(2) during login
fundamentally cannot run from a non-root container. Setgid covers
both root-internal and unprivileged-internal templates uniformly
without requiring per-template carve-outs.

DEBT-047 (R0047 BEC disk-reach) was gated on DEBT-035 for filesystem
access. That blocker is lifted — `decnet ttp` running on agents as
the local DECNET-service user can now read .eml files written by
the SMTP decoy. The remaining DEBT-047 work is the master-only gate
flip in decnet/cli/gating.py and the EmailLifter disk-reach helper
itself (factor _resolve_artifact_path out of the artifacts API
endpoint into a shared module).

Soft-fail paths in api_get_transcript.py and api_get_artifact.py
stay as defence-in-depth — option 2 should make them never fire on
a healthy install but a misconfigured deploy must not 500 the API.
2026-05-02 19:40:12 -04:00
b27332169d feat(init): create /var/lib/decnet/artifacts with setgid + group-write
DEBT-035 step 2. Today the artifacts subtree is auto-created by
Docker as root when a decoy container's bind-mount fires for the
first time. The resulting permissions are root:root 0o755 — the API
process (running as the decnet user) hits PermissionError trying to
read transcripts written by the container, and the soft-fail 404
path gets exercised on every fresh deploy.

Add `/var/lib/decnet/artifacts` to init's dirs list with mode 0o2775:

* 0o2000 — setgid bit. New files inherit the directory's group
  (decnet), regardless of which uid created them. This is the load-
  bearing bit for cross-container reads.
* 0o0775 — owner+group rwx, world rx. Group-write lets the API
  process and the local TTP worker read each other's outputs
  without a manual chown.

`_ensure_dir` already respects the full mode word via `os.chmod`,
no helper change needed.

Test asserts the resulting directory carries exactly 0o2775 after
a fresh `decnet init --prefix`. Defence-in-depth: this works even
if the per-decoy compose `user:` directive (next commit) misses a
template — files still land in the decnet group.
2026-05-02 19:35:20 -04:00
39a298f685 feat(init): persist DECNET-service api-user/api-group to decnet.ini
DEBT-035 step 1. The composer needs to know which uid/gid to inject
into each compose fragment's `user:` directive at deploy time. Today
the resolved `--user` / `--group` values reach systemd unit
rendering (init.py:349–354) but are not persisted anywhere the
composer can read them.

Persist as **names** (not numeric ids) under `[decnet] api-user` /
`api-group` in the rendered decnet.ini placeholder. Resolution to
uid/gid happens at deploy time on whichever host runs the deploy,
via `pwd.getpwnam(...)` / `grp.getgrnam(...)` — so the same user
name can have different uids on master vs agents (heterogeneous
/etc/passwd) without breaking artifact ownership. The existing
config_ini auto-translates kebab→DECNET_API_USER / DECNET_API_GROUP
at load time; no domain-map changes needed.

Two new tests: one asserting the rendered ini carries the
`api-user` / `api-group` keys for the values passed to `--user` /
`--group`; one round-tripping through `load_ini_config` to confirm
the env vars land in `os.environ` for the composer to pick up.
2026-05-02 19:33:53 -04:00
b3ea3fa925 docs(debt): merge rogue root DEBT.md into the canonical development/DEBT.md
A previous agent (and several of my own commits) wrote to a top-level
DEBT.md without seeing the existing development/DEBT.md — the
canonical register since DEBT-001. Resulted in two parallel files,
inconsistent numbering schemes, and references that resolved to the
wrong place.

Migrate the six entries that landed in the rogue file into the
canonical register as DEBT-044 through DEBT-049, preserving their
status (resolved / partial / open) and cross-references. The
TTP_TAGGING.md references to "DEBT.md" already resolve to
development/DEBT.md by virtue of being in the same directory; only
the comment in decnet/ttp/impl/intel_lifter.py needed disambiguation
to "development/DEBT.md DEBT-048".

* DEBT-044 — `attacker.email.received` producer wiring ( RESOLVED 2026-05-02)
* DEBT-045 — EmailLifter heavyweight feature extraction (PARTIAL PAID 2026-05-02)
* DEBT-046 — EmailLifter mal-hash feed integration (open)
* DEBT-047 — EmailLifter R0047 BEC unblock (open, gated on DEBT-035)
* DEBT-048 — TTP intel provider mapping review (recurring quarterly)
* DEBT-049 — TTP Sigma adapter — post-v1 (open)

Summary table extended; "Remaining open" line updated; root file
removed. The DEBT-047 entry now explicitly cross-references DEBT-035
as the gating dependency for the R0047 BEC unblock.
2026-05-02 19:17:20 -04:00
17367d0a69 docs(debt,ttp): retire shipped lanes; file mal-hash-feed and R0047-disk-reach entries
Mark the EmailLifter heavyweight follow-up as PARTIAL PAID — R0042 /
R0046 (macro / password / smuggling lanes) / R0048 fire end-to-end
after commits 291b78c1 (decky extractors) and the ingester producer
projection that follows.

Two narrower DEBT entries replace the lanes that remain gated:

* "EmailLifter mal-hash feed integration" — R0046's mal_hash_match
  lane needs a curated bad-hash feed (MalwareBazaar SHA-256 dump as
  the v0 candidate, mirroring the FeodoProvider bulk-feed pattern at
  decnet/intel/feodo.py). Feed integration, not extraction. Lifter
  predicate already reads `payload.get("mal_hash_match")` — silent
  today only because the field is absent.
* "EmailLifter R0047 BEC — unblock when artifact disk-reach lands"
  cross-references the agent UID/GID DEBT entry that blocks
  `decnet ttp` from reading artifacts written by deckies on the
  same host. Disk-reach is the intended solution; raw body_text on
  the bus is rejected because the bus transport is abstracted (the
  UNIX-socket implementation may swap to networked at any time, and
  privacy decisions must hold regardless of transport).

Append to TTP_TAGGING.md §"Producer wiring": the email.received
producer pointer (was "none — DEBT"), the full per-message payload
shape with the new heavyweight fields, and an explanatory block on
why the bus is body-text-free + how R0047 / R0048 each handle their
body dependency (R0048 via the precomputed scalar; R0047 deferred).
2026-05-02 19:12:30 -04:00
c714941069 feat(bus): project EmailLifter heavyweight fields onto email.received
The decky's Layer-2 extension (commit 291b78c1) emits body_simhash /
body_base64_bytes / html_smuggling on the message_stored log and adds
macro_indicator / encrypted booleans to each attachments_json
manifest entry. Lift them all onto the email.received bus payload:

* body_simhash — passes through as-is (16 hex chars or "")
* body_base64_bytes — coerced to int (0 on absent / malformed)
* attachment_macros / attachment_password_protected — OR-reduced
  across the per-attachment manifest booleans; matches R0046's
  matched_trigger semantics where a single positive lane fires the
  rule
* html_smuggling — coerced bool from the decky's 0/1 int

Pre-Layer-2 message_stored events (older deckies, malformed log
rows) project to safe defaults: empty simhash, zero base64-bytes,
all booleans False — the EmailLifter then stays silent, never
fires a false positive on missing data.

R0042 (mass-phish) / R0046 macro / R0046 password / R0046 smuggling
/ R0048 (encoded payload) all fire end-to-end after this commit.
R0046 mal_hash_match and R0047 BEC remain deferred per their
respective DEBT entries (filed in the next commit).
2026-05-02 19:10:30 -04:00
291b78c1d0 feat(smtp): extract body_simhash + base64-bytes + html-smuggling + per-attachment macro/encrypted
Heavyweight Layer-2 extractors land alongside the cheap projections
shipped in commit e9324aca, so the EmailLifter R0042 / R0046 (macros
/ password / smuggling lanes) / R0048 fire from the bus payload
without the lifter having to reach back to disk.

Extractors:
* body_simhash — inlined 64-bit Charikar simhash (md5-keyed,
  frequency-weighted) over word tokens of the union of text/* body
  parts. Inlined rather than pulling the `simhash` PyPI dep, which
  transitively brings numpy ~50 MB into a slim decky container; the
  algorithm is ~15 lines and identical in extraction quality.
* body_base64_bytes — largest decoded base64 chunk's byte count,
  scanning text body parts with the same `_BASE64_RE` the lifter's
  `_p_encoded_payload` fallback uses. R0048 fires from this scalar
  alone; the lifter's body_text fallback becomes dead in normal
  operation.
* attachment_macro_indicator — stdlib zipfile sniff for
  `vbaProject.bin` inside OOXML containers. Catches modern .docm /
  .xlsm / .pptm and macro-injected .docx; legacy .xls (CFBF) is a
  follow-up.
* attachment_encrypted — flag_bits & 0x01 on any ZIP / OOXML entry's
  central directory; magic-byte match for 7z / RAR / CFBF (encrypted
  Office wrap).
* html_smuggling — structural lxml parse first: fires when an `<a
  download>` element coexists with a `<script>` referencing
  `Blob` / `Uint8Array` / `URL.createObjectURL`. Regex pair-check
  fallback on lxml parse failure (real-world phish HTML is often
  malformed). Cuts the FP rate that pure-regex would produce on
  legitimate "click to download" links.

Add `python3-lxml` (~5 MB Debian package, C-extension, no transitive
Python deps) to the SMTP decky's Dockerfile. simhash stays inline.
Per the dependency rule: lxml earns its weight by cutting R0046's
OR-combined FP rate; a heavier macro-detection lib (oletools ~5 MB
pure-python with msoffcrypto) would not measurably improve the
boolean signal we need, so stdlib stays for that lane.
2026-05-02 19:08:37 -04:00
fb85762703 feat(bus): publish email.received from ingester after SMTP artifact persist
Wires the EmailLifter (R0041–R0048) producer that DEBT.md item #3
deferred. After the existing add_bounty() call in _extract_bounty
(line 615), call _publish_email_received() which:

* resolves the attacker_uuid via repo.get_attacker_uuid_by_ip; drops
  the publish if unresolved (the TTP worker can't anchor orphan
  events)
* projects the message_stored fields onto the EmailLifter wire
  contract: from_domain / mail_from_domain / return_path_domain
  parsed via _domain_of, rcpt_count + rcpt_domains via
  _rcpt_projection, attachment_sha256s + attachment_extensions
  derived from the existing attachments_json manifest, urls from
  urls_json, dkim_signed/spf_pass coerced from 0/1 ints to bool
* mirrors _publish_probe_pending's bus-per-call pattern and
  swallows all exceptions (the bus is the notification layer, not
  the source of truth)

Fires for both relay and non-relay SMTP services. R0041 / R0043 /
R0044 / R0045 are now live end-to-end; R0046 partial (extension
lane). Heavyweight predicates (R0042 simhash, R0046-deep, R0047 /
R0048 body_text) stay deferred per the EmailLifter heavyweight
DEBT entry.
2026-05-02 18:39:13 -04:00
e9324acac7 feat(smtp): emit X-Mailer / Return-Path / dkim+spf / URLs on message_stored
The EmailLifter (R0041–R0048) keys on header-derived signals that the
v0 _summarize_message did not extract. Add cheap Layer 2 projections
inside the existing single-pass parse:

* return_path / x_mailer — direct header reads, decoded RFC 2047
* dkim_signed / spf_pass — booleans derived from any
  Authentication-Results header (multiple lines tolerated; positive
  verdict on any line wins)
* urls — http(s) URLs lifted from text/* body parts via a tight
  regex, deduplicated first-seen-wins, capped at 64 in the wire
  payload to bound the syslog SD value

Heavyweight extraction (body simhash, office-macro detection,
HTML-smuggling, password-protected archives, mal-hash-match,
body_text projection) stays deferred per the EmailLifter heavyweight
DEBT entry — those rules need privacy / extractor decisions before
they ship.
2026-05-02 18:37:11 -04:00
2ce150a53e docs(debt): mark email.received producer as paid; file heavyweight follow-up
The 2026-05-02 paydown wires the producer at ingester.py after
add_bounty(), with the cheap projections (domains, rcpt_count,
attachment_count, x_mailer, dkim/spf, attachment shas + extensions,
URLs). R0041 / R0043 / R0044 / R0045 fire end-to-end after this PR;
R0046 partial.

The remaining lanes (R0042 body_simhash, R0046 macro / smuggling /
password / mal_hash, R0047 / R0048 body_text projection) are filed
as a new entry "EmailLifter heavyweight feature extraction" with the
field map and the privacy-vs-completeness fork on body_text called
out for the next maintainer to pick a side.
2026-05-02 18:24:51 -04:00
9a7d116351 docs(ttp): sync A.10 + rewrite §9 drift runbook + DEBT.md markers
Appendix A.10 corrected to match the post-2026-05-02-audit reality:
AbuseIPDB cat 7/13/16/17 land on their canonical AbuseIPDB names
(Phishing / VPN IP / SQL Injection / Spoofing); cats 4 and 10 carry
explicit "drop" annotations so the next reviewer sees the intent
rather than guessing. ThreatFox table re-keys on `threat_type` (the
canonical taxonomy field) and adds the `payload` and `cc_skimming`
rows. GreyNoise table promotes bare-malicious to a half-multiplier
emission of T1071.

§"Hard parts §9 Intel provider drift" replaces the prose handwave
with a runnable check: provider URLs, the ThreatFox curl invocation
that needs DECNET_THREATFOX_API_KEY, the rule_version + emits +
attack_catalog co-evolution rules, and the full chain of files to
exercise. Adds a "Ship-time audit log" subsection so future quarterly
runs have a known-good baseline to diff against.

DEBT.md item #1 records LAST_REVIEWED: 2026-05-02 / NEXT_REVIEW:
2026-08-02 and points at §9 for the runbook. DEBT.md item #3 (the
attacker.email.received producer) flags its gating premise as
potentially stale — ANTI noted SMTP honeypots already persist
received messages, contradicting the "no source row" claim that
deferred the wiring.
2026-05-02 18:09:20 -04:00
f8dee596e5 fix(ttp): expand R0054/R0055/R0057 emits + LAST_REVIEWED markers
The IntelLifter's _emit_filtered fans out only the rule.emits entries
whose technique_id appears in the predicate's decision set. v1's emits
lists were narrow supersets of the common case, silently dropping the
rest of the predicate's possible emissions:

  R0054 dropped: T1046 (cat 14), T1078 (cat 20), T1090 (cats 9/13),
                 T1496 (cat 11), T1595 (cats 14/19)
  R0055 dropped: T1090 (tor_exit_node), T1110 (ssh_bruteforcer),
                 T1588 (the second emit of every C2-framework tag)
  R0057 dropped: T1105 (payload_delivery, download_url)

Bump rule_version 1->2 on R0054/R0055/R0057, expand emits to cover
every technique the predicate produces. R0056 (Feodo) and R0058
(aggregate bump) carry no enum and stay at v1.

All five YAMLs gain `last_reviewed: "2026-05-02"` and
`next_review: "2026-08-02"` markers; the rule YAML is now the
canonical record of when the mapping was last reconciled against
upstream, with DEBT.md as the calendar reminder.
2026-05-02 18:09:03 -04:00
75ff0ede1f fix(ttp): correct intel_lifter mappings + repoint ThreatFox to threat_type
Three bug classes uncovered by the 2026-05-02 ship-time audit:

* AbuseIPDB code/name mismatch in v1: cat 10 was treated as DDoS (it's
  Web Spam — DDoS is cat 4, intentionally unmapped per A.10) and cat 17
  as VPN IP (it's Spoofing — VPN IP is cat 13). Both typos mirrored in
  code AND the design doc Appendix A.10. Code now matches the AbuseIPDB
  taxonomy exactly; cat 17 retargets to T1566 (email-spoofing as a
  phishing precursor), and cats 7 (Phishing) and 16 (SQL Injection)
  pick up T1566 / T1190 emissions that v1 didn't cover.

* ThreatFox dispatch keyed on `ioc_type` in v1, but `ioc_type` is the
  indicator format (url / domain / hash variants) and carries no ATT&CK
  signal. The canonical taxonomy field per ThreatFox's API is
  `threat_type` (botnet_cc / payload_delivery / payload / cc_skimming).
  Repoint dispatch through the new `threatfox_threat_types` payload
  field; `ioc_type` rides as evidence only. Also adds the missing
  cc_skimming -> T1056 (Input Capture) mapping and registers T1056 in
  attack_catalog.py.

* GreyNoise bare-malicious lane: a `classification == "malicious"` row
  with no recognised tag used to emit nothing. Now lights T1071 at a
  half multiplier, suppressed when a tag already fires T1071 to avoid
  double-stamping at conflicting confidence levels.
2026-05-02 18:08:48 -04:00
a31ad82880 feat(intel): project per-provider taxonomy into attacker.intel.enriched payload
The TTP worker forwards the bus payload verbatim to the IntelLifter as
TaggerEvent.payload. The pre-audit publish payload only carried
{attacker_uuid, attacker_ip, aggregate_verdict, providers}, so even with
the new AttackerIntel taxonomy columns populated the lifter still saw
nothing. Lift the relevant fields (categories / tags / threat_types /
malware family / score / classification) into the bus event and decode
JSON-string list columns back to native lists at the boundary.
2026-05-02 18:08:29 -04:00
999d3494b4 feat(intel): persist per-provider taxonomy on AttackerIntel for TTP dispatch
The 2026-05-02 ship-time audit of the R0054-R0058 intel rule pack found
that AbuseIPDB / GreyNoise / ThreatFox stored only the aggregate verdict
(score / classification / listed-bool) plus the raw response blob. The
TTP IntelLifter expects per-provider taxonomy fields (categories, tags,
threat_types) that were never populated, so R0054 / R0055 / R0057
emitted zero tags in production despite passing unit tests.

Add typed columns: abuseipdb_categories, greynoise_tags, greynoise_name,
feodo_malware_family, threatfox_threat_types, threatfox_ioc_types,
threatfox_malware_families. Each provider now parses the relevant
taxonomy out of the upstream response and writes it through
column_updates. JSON-list columns ride as TEXT with default "[]" to
keep the SQLite/MySQL backend split honest, deserialised back to native
lists by the repo on read.
2026-05-02 18:07:57 -04:00
d1c4a48963 feat(ttp): split bash CMD evidence into structured uid/user/src/pwd/cmd rows
The inspector was dumping the whole `CMD uid=0 user=root src=… pwd=…
cmd=nmap -p- 192.168.1.0/24` syslog body into a single ``command_text``
blob. ANTI: "I'd like to separate the fields." Done — three layers
work together:

1. Collector session aggregator: new `_parse_cmd_msg` splits the bash
   PROMPT_COMMAND msg into `{uid, user, src, pwd, command}`. The
   session-ended envelope's per-command dict now carries the
   structured fields, with `command_text` set to just the cmd= value
   (preserving embedded whitespace — `nmap -p- 1.2.3.0/24` etc.).

2. Rule engine: per-source_kind auxiliary evidence list
   (`_AUX_EVIDENCE_FIELDS`). For `command` events the engine
   automatically promotes uid/user/src/pwd into the persisted
   `evidence` dict on top of the rule's explicit `evidence_fields`.
   Engine-controlled, not per-rule — adding a new aux field is one
   line here, not a 30-rule YAML sweep, and rule authors can't
   accidentally drop it.

3. TTPInspector frontend: evidence renders as a structured
   `kvs` grid (UID / USER / SRC / PWD / CMD rows) instead of
   pretty-printed JSON. Primary-order list keeps shell fields at
   the top; everything else falls below alphabetically so unfamiliar
   evidence shapes still surface predictably.

Tests:
- session_aggregator pins the structured-fields emit (uid/user/src/
  pwd/command_text without "CMD" prefix, embedded whitespace
  preserved).
- rule_engine_tagger pins the aux-field auto-promotion + the
  no-`None`-leakage path when payload doesn't carry an aux key.
2026-05-02 03:20:53 -04:00
84699f89da feat(ttp): show canonical ATT&CK technique names in the TTPs UI
"T1595" alone is opaque; "T1595 — Active Scanning" tells you the
story at a glance. The names come from a backend-side static catalogue
pinned to the same ATT&CK release as the rule engine
(_ATTACK_RELEASE = "v15.1") — names are the canonical MITRE labels,
not author-supplied strings on rules, so a rule author can't typo a
name and the entire fleet sees the typo.

- New `decnet/ttp/attack_catalog.py` with `TECHNIQUE_NAMES` covering
  every technique_id + sub_technique_id emitted by `rules/ttp/`
  (R0001..R0058 → 69 IDs in the v0 pack).
- `IdentityTechniqueRow` / `TechniqueRollupRow` / `CampaignTechniqueRow`
  / `TTPTagDetailRow` gain optional `technique_name` /
  `sub_technique_name` fields. Repo + router populate them from the
  catalogue at row-construction time. None when an ID isn't in the
  catalogue — UI falls back to the bare ID.
- Coverage test (`tests/ttp/test_attack_catalog.py`) walks every
  YAML rule and asserts every emitted ID has a catalogue entry, so
  a future rule author who forgets to update the catalogue gets a
  loud failure rather than a silent UI fallback.

Frontend:
- `TTPsObservedSection` shows "T1595.002 — Active Scanning:
  Vulnerability Scanning" instead of just the ID, with overflow
  ellipsis + tooltip for narrow viewports. Inspector header /
  TECHNIQUE row also surface the names.
2026-05-02 03:10:07 -04:00
42e9492118 feat(ttp): inspector drawer surfaces evidence + rule_id behind each technique
The TTPsObservedSection rollup tells the operator "we saw T1059" but
not why. Click any technique row → side drawer opens listing every
ttp_tag row in scope with the persisted evidence JSON, firing
rule_id / rule_version, source_kind / source_id, confidence, and
created_at. Mirrors the CredentialReuseInspector / BountyInspector
pattern (drawer-backdrop + bd-head/bd-body + kvs grid).

Backend:
- New `GET /api/v1/ttp/tags/by-{scope}/{uuid}/{technique_id}`
  (`scope ∈ {identity, attacker, session}`, optional
  `?sub_technique_id=`, `?limit=` capped to 1000). Returns raw
  TTPTag rows newest-first.
- New `TTPTagDetailRow` Pydantic model + re-export.
- New repo method `list_tags_by_scope_and_technique` on
  TTPMixin (+ abstract on BaseRepository) — single query branched
  on scope; identity scope projects through `Attacker.identity_id`
  the same way `list_techniques_by_identity` does.
- Tests: evidence round-trips, sub_technique filter, JWT-required,
  empty scope, unknown scope rejected.

Frontend:
- New `TTPInspector.tsx` + `TTPInspector.css` (violet accent, slide
  animation, focus-trapped panel matching the existing inspector
  family).
- `TTPsObservedSection`'s TechniqueBar is now click+keyboard
  activatable; clicking opens the inspector for that
  (technique, sub_technique) tuple.

mypy clean. 532 passed in the targeted sweep.
2026-05-02 02:55:05 -04:00
c4e29e3bf9 fix(ttp): resolve attacker_uuid from attacker_ip on bus-event consume
The collector's `attacker.session.ended` envelope carries
`attacker_uuid: null` and `attacker_ip: <ip>` because the collector
doesn't talk to the DB. The TTP worker passed that null straight
through, and `TTPTag.__init__` raised the documented invariant:

    ValueError: ttp_tag requires at least one of attacker_uuid /
                identity_uuid; both NULL is not a valid anchor.

The worker now resolves `attacker_uuid` from `attacker_ip` via
`BaseRepository.get_attacker_uuid_by_ip` before fanning out the
event. When the IP isn't in the DB yet (profiler hasn't ingested
the row), the event is dropped with one log line — better than
exploding mid-tag.

- New `get_attacker_uuid_by_ip(ip) -> str | None` on the repo
  (BaseRepository abstract + AttackersCoreMixin impl).
- `_resolve_attacker_uuid` helper in `decnet/ttp/worker.py` runs
  before `_build_events`. Short-circuits when the payload already
  has either anchor; drops the event when neither anchor is
  resolvable.
- Tests pin: short-circuit on existing uuid/identity, repo lookup,
  drop on unknown IP, drop on "Unknown" sentinel, drop on
  no-anchor payload, drop on repo failure.
2026-05-02 02:44:30 -04:00
f9901befc4 docs(ttp): catalogue producer wiring for every TTP-watched topic
Add a "Producer wiring" subsection under TTP_TAGGING.md §"Bus
topics" mapping every topic the TTP worker subscribes to onto the
file:line that publishes it. Calls out the gap (`email.received`
has no producer today) and the new `attacker.session.ended`
payload shape from the collector aggregator.

Also lists the four producer regression tests added in this series
so a future contributor sees the safety net before staring at the
silent rule engine.

DEBT.md gets the `attacker.email.received` follow-up entry — wire
the producer when SMTP-receive persistence lands, since today the
honeypot relay path doesn't store received emails anywhere a
publisher could read from.
2026-05-02 02:39:23 -04:00
b5ce236cab test(bus): pin scope-(2) producer wiring for reuse / clusterer / intel
Three producer-side regression guards. Each drives the worker's run
loop with a fake bus + stubbed repo and asserts the documented topic
fires when the producer has data:

- reuse correlator → credential.reuse.detected (one finding row)
- clusterer → identity.formed + identity.merged (one ClusterResult)
- intel worker → attacker.intel.enriched (one unenriched attacker
  + a fake provider returning a "malicious" verdict)

These complement commit 1's attacker.session.ended producer test —
together the four cover every TTP-relevant publisher in the tree
(modulo email.received, which has no producer yet; tracked in
DEBT.md).
2026-05-02 02:38:24 -04:00
b043c96d29 feat(collector): publish attacker.session.ended on session_recorded events
The TTP worker subscribes to attacker.session.ended but no upstream
component published it — the rule pack (R0001–R0030) therefore never
fired on live SSH traffic even after the consume-side wiring landed
in E.3.18a/b/c.

The collector now hosts a per-attacker_ip command index
(_SessionAggregator) that watches the same parsed-event stream as
_publish_log. Shell `command` events are appended to a per-IP list;
on `session_recorded` the aggregator slices the list to commands
inside the [ended_at - duration_s, ended_at] window and publishes
attacker.session.ended with the session metadata + commands list.
The TTP worker's _build_events fan-out (E.3.18b) turns each command
into a source_kind="command" TaggerEvent that the RuleEngineTagger
(E.3.18c) matches against R0001–R0030.

Memory bound: per-IP entries TTL-evict at DECNET_COLLECTOR_SESSION_AGG_TTL_SEC
(default 3600 s). Publish failures are swallowed in the aggregator —
a misbehaving bus cannot stall the per-container stream threads.
2026-05-02 02:35:08 -04:00
d9d2a80573 fix(collector): unwrap double-wrapped RFC5424 around bash PROMPT_COMMAND
Honeypot SSH containers run `PROMPT_COMMAND` that calls
`logger --rfc5424 --msgid command -t bash "CMD …"`. The Docker-stdout
reader prepends an outer RFC5424 envelope (HOSTNAME=<decky>,
APP-NAME=1, MSGID=NIL) around that inner syslog line. Both the
collector parser (`parse_rfc5424`) and the correlation parser
(`parse_line`) saw the outer NIL MSGID and emitted `event_type="-"`
for every shell command — which:
  - kept `Attacker.commands` rows missing `command_text`
  - left R0001–R0030 (the pattern rule pack that matches shell
    commands) with no haystack
  - made `decnet.collector.log` show `event written … type=-`
    for the very lines that should be `type=command`

Both parsers now detect the inner-RFC5424 shape (`<TS> <HOST> <APP>
<PROCID> <MSGID> <rest>`) when the outer MSGID is NIL and the SD-arm
is also NIL, and re-extract HOSTNAME / APP-NAME / MSGID / remainder
from the body. The collector parser also recovers the post-SD msg
tail when the SD block isn't `relay@55555` (the bash CMD line carries
a `[timeQuality …]` block) so the kv-fallback can find `src_ip`.

Mirroring tests in tests/collector and tests/correlation pin both
the unwrap and the regression guard for non-double-wrapped lines.
2026-05-02 02:32:21 -04:00
e08bfc4a73 fix(ttp): /api/v1/ttp/rules returns the live rule catalogue
The endpoint was a contract-phase stub returning `[]` even though the
RuleStore loaded all 58 YAML rules at worker startup. UI saw an empty
table; operators couldn't tell whether anything was wired up.

- `api_list_rules` now calls `get_rule_store().load_compiled()` and
  serializes each CompiledRule + its operational state into a
  RuleCatalogueRow. Sorted by rule_id for stable golden snapshots.
- Add `description: str` to RuleSchema (pydantic) and CompiledRule
  (NamedTuple, defaulted) + propagate through `_compile_one` so the
  catalogue surfaces the human-readable YAML description, not just
  the slug-style `name`.
- Update `tests/ttp/test_rule_engine.py` _fields assertion for the
  new column; new `tests/api/ttp/test_rules_catalogue.py` pins the
  catalogue contents (R0001/R0014 presence, row shape, sort order).

Worker behaviour is unchanged: it was already loading rules
correctly. This is purely a read-side wiring fix on the operator API.
2026-05-02 01:54:06 -04:00
7ab0df3680 chore(cleaning): deleted swp vimfile 2026-05-02 01:39:17 -04:00
ca1e04033c docs(ttp): E.5 verification log appended to TTP_TAGGING.md
Closes the CDD design phase. Records:
- §E.1 contract inventory (every file exists, compileall clean).
- Targeted pytest pass: 604 passed, 1 skipped, 10 xfailed
  (all xfails are `xfail(strict=True)` with reason= pointing to the
  impl step that flips them; carry-overs, not flakes).
- Strict mypy over decnet/ttp + decnet/cli/ttp.py +
  decnet/web/router/ttp + decnet/web/db/sqlmodel_repo/ttp.py: clean.
- Stranger-readability spot check on tests/ttp/: no doc bugs.

Notes the three pre-E.4 wiring fixes (E.3.18a/b/c) and the E.4
backfill CLI / DEBT entries that landed in this series.
2026-05-02 01:37:45 -04:00
7d1f048764 docs(ttp): E.4.b/E.4.c DEBT entries — provider review + Sigma deferral
Quarterly TTP provider mapping review for AbuseIPDB / GreyNoise /
abuse.ch (Feodo Tracker, ThreatFox) catalogue drift against
`rules/ttp/R0054..R0058`, and the post-v1 trigger for the Sigma rule
adapter. Both items reference TTP_TAGGING.md sections so the
rationale stays linked to the design doc.
2026-05-02 01:35:49 -04:00