DECNET

Author	SHA1	Message	Date
anti	627fa59c15	feat(profiler/behave_shell): emit temporal.session_duration Bucket ctx.duration_s against SESSION_DURATION_SHORT_MAX (60s) / MEDIUM_MAX (600s) / LONG_MAX (3600s); else marathon. Direct measurement, confidence 0.85. Skip emission only when no commands and zero duration. New _features/temporal.py module opens Phase E.	2026-05-04 00:10:57 -04:00
anti	46775fc0e5	test(profiler/behave_shell): Phase D calibration-grid lockdown + completion log Widens the binding calibration set from PHASE_ABC_PRIMITIVES (13) to PHASE_ABCD_PRIMITIVES (17). The four unconditional Phase D primitives (cognitive_load, exploration_style, planning_depth, tool_vocabulary) join the per-shard hard gate. The three error_resilience.* primitives are conditional on at least one errored command in the shard and tracked in PHASE_D_CONDITIONAL_PRIMITIVES — excluded from the per-shard required-emission set, included in the cross-class discrimination check. cognitive_load empirical re-tune deferred to the next BEHAVE_CALIBRATION_DIR run; v0.1 thresholds ship. Phase D completion log appended to BEHAVE-EXTRACTOR.md; Phase D checkboxes flipped to [x].	2026-05-04 00:03:46 -04:00
anti	0fba6b6113	feat(profiler/behave_shell): emit cognitive.error_resilience.fallback_to_man For each errored command, check whether the next command's first_token_hash is in {man, help, info} (precomputed at module load). At least one match → present, else absent. The --help / -h flag forms aren't first tokens; v0.2 will reconsider once arg-token hashing is justified by corpus.	2026-05-04 00:01:45 -04:00
anti	8183218d29	feat(profiler/behave_shell): emit cognitive.error_resilience.frustration_typing Compares median within-command IAT for commands following an errored command vs commands following a successful one. Relative absolute delta buckets to low / moderate / high. Skips when either group is empty (no errors, or no clean baseline). v0.1; D.8 re-tunes.	2026-05-04 00:00:36 -04:00
anti	b704352783	feat(profiler/behave_shell): emit cognitive.error_resilience.retry_tactic Modal response across Command.errored=True commands: * same first_token_hash on next command → rerun * different first_token_hash → switch * no next command → abort Tiebreak in registry order. The fourth registry value 'modify' requires within-command arg diffing (PII boundary); deferred to v0.2.	2026-05-03 23:58:58 -04:00
anti	f286c84d95	feat(profiler/behave_shell): emit cognitive.tool_vocabulary Absolute distinct first_token_hash count, bucketed against TOOL_VOCAB_NARROW_MAX / TOOL_VOCAB_BROAD_MIN. v0.1; D.8 re-tunes.	2026-05-03 23:56:22 -04:00
anti	6c2e4ada83	feat(profiler/behave_shell): emit cognitive.planning_depth Distribution of inter-command IATs bucketed against IKI_THINK_MAX_S (deep) and INTER_CMD_INSTANT_MAX (reactive); fall-through is shallow. v0.1 thresholds; D.8 re-tunes.	2026-05-03 23:55:16 -04:00
anti	2254651270	feat(profiler/behave_shell): emit cognitive.exploration_style Two-axis classification over the first_token_hash sequence: repetition_rate (drilling) vs backtrack_rate (jumping among prior tools). chaotic/targeted/methodical buckets. v0.1 thresholds; D.8 re-tunes.	2026-05-03 23:54:03 -04:00
anti	f948e10830	feat(profiler/behave_shell): emit cognitive.cognitive_load Composite over three [0, 1]-clipped sub-signals (chunking variance, error rate from D.0's Command.errored, pace variability), mean-aggregated and bucketed against COGNITIVE_LOAD_LOW_MAX / COGNITIVE_LOAD_MEDIUM_MAX. Components missing data drop out of the mean rather than zeroing it. v0.1 thresholds; D.8 re-tunes once D.2-D.7 are stable. Confidence held at 0.60 (composite over soft sub-signals) and halved below the 5-command sample-size floor.	2026-05-03 23:52:29 -04:00
anti	601986bd6d	feat(profiler/behave_shell): output error-signal helper for Phase D Lifts the error-signal slice of F.0 forward as a D.0 prelude. ANSI strip + canonical bash/sh error fingerprints classify each command's post-execution output window; Command gains errored / output_bytes fields. PII discipline preserved — only a bool and an int leave the helper, the stripped output text is dropped on return. Drives D.1 (cognitive_load error_rate term) and D.5–D.7 (error_resilience family). Phase F.0 will subsume this with PS1 + exit-code parsing.	2026-05-03 23:46:31 -04:00
anti	bc62e42ce1	feat(profiler/behave_shell): emit motor.shell_mastery.pipe_chaining_depth	2026-05-03 23:34:54 -04:00
anti	4fc980e968	feat(profiler/behave_shell): emit motor.shell_mastery.shortcut_usage	2026-05-03 23:33:07 -04:00
anti	a077cf67c8	feat(profiler/behave_shell): emit motor.shell_mastery.tab_completion	2026-05-03 23:31:20 -04:00
anti	8161c67ec5	feat(profiler/behave_shell): emit motor.command_chunking BEHAVE-EXTRACTOR.md Phase B Step B.4. First implementation — prototype doesn't ship this primitive. * SessionContext gains intra_command_iats: per-command tuple of IATs between consecutive input events whose timestamps fall inside [cmd.start_ts, cmd.end_ts). Excludes the terminator IAT. Built by _per_command_iats. * _features/motor.py:command_chunking(ctx) emits one Observation in {fluent, fragmented, single_command}. - 0 commands → skip emit - 1 command → single_command (registry-allowed point) - ≥2 commands → median CV across per-command typed-IATs; < CMD_CHUNKING_FLUENT_CV_MAX (0.50) → fluent, else fragmented - paste-only sessions (no command has ≥3 typed IATs) → skip emit (no honest within-command rhythm to measure) Confidence 0.80 / 0.65 / 0.60. * Calibration grid widened to include motor.command_chunking; green across all five shards. Phase B primitive set complete. Tests: no commands → skip, 1 command → single_command, uniform typing → fluent, alternating fast/slow → fragmented, paste-only multi-command → skip emit.	2026-05-03 21:29:31 -04:00
anti	d04f91cd8c	feat(profiler/behave_shell): emit motor.error_correction BEHAVE-EXTRACTOR.md Phase B Step B.3. Replaces the prototype's two-line "0 vs >0 backspaces" placeholder with a backspace-timing classifier that honours the registry's full vocabulary. * SessionContext gains backspace_count, backspace_iats (IAT from each backspace back to the preceding non-backspace input event), and kill_line_count (^U / ^W). Built by _scan_correction_signals, which retains only counts and timing aggregates — no character data leaves the helper, in line with the BEHAVE PII discipline. * _features/motor.py:error_correction(ctx) emits one Observation in {immediate, deferred, absent, route_around}. - 0 backspaces + ≥1 ^U/^W → route_around (rewrite, not correct) - 0 backspaces + 0 kill-lines → absent - backspaces with median IAT ≤ 500 ms → immediate - slower → deferred Confidence 0.65 / 0.65 / 0.55 / 0.55. * < 3 inputs → skip emit. * Calibration grid widened to include motor.error_correction; green across all five shards. Tests cover all four buckets, the < 3 inputs skip, and the PII regression (raw command body never appears in the serialised observation).	2026-05-03 21:27:46 -04:00
anti	0737fcfe93	feat(profiler/behave_shell): emit motor.motor_stability BEHAVE-EXTRACTOR.md Phase B Step B.2. First principled implementation — the prototype doesn't ship this primitive at all. * _features/motor.py:motor_stability(ctx) emits one Observation in {steady, variable, tremor}. Reuses ctx.typing_bursts from B.1. * Tremor proxy: fraction of within-burst IATs below TREMOR_FAST_FLOOR_S (30 ms — humans can't sustain sub-50 ms IATs). ≥ TREMOR_RATE_MIN (10%) sub-floor → tremor (double-press / motor twitch / stuck-key). * Otherwise median burst CV decides: < CV_STEADY_MAX → steady, else → variable. Confidence 0.70 / 0.60 / 0.65. * No typing bursts or fewer than 5 within-burst IATs → skip emit. * Calibration grid widened to include motor.motor_stability; green across all five shards. Tests cover all three buckets + skip paths.	2026-05-03 21:25:54 -04:00
anti	d90c8b70ce	feat(profiler/behave_shell): emit motor.keystroke_cadence BEHAVE-EXTRACTOR.md Phase B Step B.1. * SessionContext gains typing_bursts: tuple[tuple[float, ...], ...] built by _split_typing_bursts(iats) — splits at gaps > IKI_THINK_MAX_S (1.5s) and drops bursts of fewer than 3 IATs. Mirrors prototype's _split_into_bursts at BEHAVE/prototype_extractors/shell/extract.py:275. * _features/motor.py:keystroke_cadence(ctx) emits one Observation in {steady, bursty, hunt_and_peck, machine}. Median CV across typing bursts; mean IKI < IKI_MACHINE_MAX_S paired with CV < CV_MACHINE_MAX → machine. Confidence 0.85/0.70/0.65/0.60 per the prototype's calibration history. * < MIN_INPUTS_FOR_CADENCE inputs or zero typing bursts → skip emission. v0.1 emits only the burst-CV variant; the prototype's NAIVE session-CV variant is parked for v0.2. * Calibration grid widened (PHASE_A_PRIMITIVES → PHASE_AB_PRIMITIVES) to include motor.keystroke_cadence. Grid green across all five shards. Tests: too-few-inputs → no emit, all-think-pauses → no burst → no emit, uniform IATs → steady, sub-5ms → machine, mixed-pace → bursty, extreme bimodal → hunt_and_peck.	2026-05-03 21:24:13 -04:00
anti	640294f3dc	test(profiler/behave_shell): five-class calibration grid lockdown BEHAVE-EXTRACTOR.md Phase A Step 9 — the gate. Runs the pure engine against each of the five 2026-05-02 calibration shards and pins the contract that all subsequent Phase B-G PRs must keep green: every Phase A primitive (motor.input_modality, motor.paste_burst_rate, cognitive.inter_command_latency_class, cognitive.command_branch_diversity, cognitive.feedback_loop_engagement, cognitive.inter_command_consistency) fires at least once per shard. * tests/profiler/behave_shell/test_calibration_grid.py parametrized over (shard_file, class_label) for HUMAN / YOU-sim / LW-sim / CLAUDE-FF / CLAUDE-CL. Skips entirely when BEHAVE_CALIBRATION_DIR is unset (CI provides the path; local dev doesn't have to). * Plus a discrimination-smoke check: at least one primitive produces different majority values across present classes — catches the "constant-output regression" failure mode where the engine quietly degenerates to a stub. Calibration tweak: BRANCH_DIVERSITY_LINEAR_MIN dropped from 0.80 to 0.70 to align with the prototype's empirical anchors (CLAUDE-CL ≈ 0.55-0.60 adaptive; YOU-sim / CLAUDE-FF scripted recon ≈ 0.75+ linear). Test for the middle band re-pinned at the new boundary. Per-class value pinning (e.g. HUMAN must emit inter_command_consistency=bimodal) is intentionally NOT a hard gate yet — v0.1 thresholds put real human sessions in "variable", and true bimodal detection (Hartigan dip / two-peak) is registry-flagged for v0.2. Tighter pinning lands as the corpus grows.	2026-05-03 08:00:50 -04:00
anti	842b7de950	feat(profiler/behave_shell): emit cognitive.inter_command_consistency BEHAVE-EXTRACTOR.md Phase A Step 8. Dispersion / bimodality of inter-command pauses. HUMAN-bimodal vs LLM-metronomic. * _features/cognitive.py:inter_command_consistency(ctx) emits one Observation in {metronomic, variable, bimodal}. * CV = stdev / mean of ctx.inter_cmd_iats. CV < 0.40 → metronomic (LLM-pure; corpus anchor 0.24); CV ≥ 1.50 → bimodal heuristic (LLM-assisted human; v0.1 placeholder, true bimodal via Hartigan dip is registry-flagged for v0.2); else → variable (human; corpus anchor 0.94). * < 2 IATs or zero mean → skip emission. < 5 commands halves confidence (0.40 vs 0.75) per sample-size honesty. Tests: too-few IATs → no emission, uniform → metronomic, human-like dispersion → variable, extreme bursts+gaps → bimodal, low-sample-count → reduced confidence. Step 8 closes the six-primitive calibration floor for Phase A. Step 9 (calibration grid lockdown) is the gate that pins it.	2026-05-03 07:56:49 -04:00
anti	2f8c107e70	feat(profiler/behave_shell): emit cognitive.feedback_loop_engagement BEHAVE-EXTRACTOR.md Phase A Step 7. The orthogonal axis — does the operator's pause-after-command correlate with bytes of output they just saw? Splits HUMAN/CLAUDE-CL (closed_loop) from LW-sim/CLAUDE-FF (fire_and_forget); cuts ACROSS the LLM/human axis. * _features/cognitive.py:feedback_loop_engagement(ctx) emits one Observation in {closed_loop, fire_and_forget, unknown}. * Pearson correlation between ctx.output_per_cmd[i] and ctx.inter_cmd_iats[i] (paired by construction in Step 4); via statistics.correlation with constant-series fallback to "unknown". * r > FEEDBACK_CORRELATION_MIN (0.30) → closed_loop; otherwise (zero, negative, or undefined) → fire_and_forget. * First primitive that depends on output events: zero output events in the shard or fewer than FEEDBACK_MIN_PAIRS (5) pairs → emit "unknown" at confidence 1.0 (the absence-of-data is itself a high-confidence answer). Zero-command session skips entirely. Tests: no-output → unknown, few-pairs → unknown, strong positive r → closed_loop, constant pace → fire_and_forget/unknown, negative r → fire_and_forget.	2026-05-03 07:55:38 -04:00
anti	3fc6ea5f75	feat(profiler/behave_shell): emit cognitive.command_branch_diversity BEHAVE-EXTRACTOR.md Phase A Step 6. Content-based playbook-vs- adaptive split. Splits CLAUDE-FF (linear_playbook, ~10 distinct tools) from CLAUDE-CL (adaptive_branching, 5-6 tools with curl re-invoked) per the 2026-05-02 empirical anchor. * _features/cognitive.py:command_branch_diversity(ctx) emits one Observation in {linear_playbook, adaptive_branching, unknown}. * unique_first_token_hashes / total_commands ratio. ≥ 0.80 → linear_playbook, otherwise adaptive_branching (the doc instructs bias-to-adaptive in the middle band — that's the discriminative signal we actually want). * < 5 commands → "unknown" at confidence 1.0 (the absence of data is itself a high-confidence answer per the registry's allowed vocabulary). Zero-command session skips emission entirely. Tests cover unique-tokens → linear, repeated-tokens → adaptive, middle band → adaptive (bias), under-floor → unknown @ 1.0, plus PII regression: raw tokens never appear in the serialised observation.	2026-05-03 07:54:13 -04:00
anti	e52a0e0381	feat(profiler/behave_shell): emit cognitive.inter_command_latency_class BEHAVE-EXTRACTOR.md Phase A Step 5. Classifies the operator's thinking pace between commands. Splits LW-sim / CLAUDE-FF / CLAUDE-CL. * _features/cognitive.py:inter_command_latency_class(ctx) emits one Observation in {instant, typing_speed, deliberate, llm_lightweight, llm_heavyweight, long}, computed as the median of ctx.inter_cmd_iats bucketed against the prototype thresholds (v0.2 split: lightweight 2-8s, heavyweight 8-30s). * Sample-size honesty: < 5 commands halves confidence (0.40 vs 0.80) per BEHAVE-EXTRACTOR.md. * Threshold consts (INTER_CMD_*_MAX, MIN_COMMANDS_FOR_FULL_CONFIDENCE, plus parked Step 6/7/8 thresholds for the next three commits) added to _thresholds.py. Tests cover all six buckets at empirically-anchored IATs (15s ≈ Claude Opus driving recon via tmux send-keys), plus the single-command no-IAT and low-sample-count paths.	2026-05-03 07:52:39 -04:00
anti	f3880b24d1	feat(profiler/behave_shell): command segmentation in SessionContext BEHAVE-EXTRACTOR.md Phase A Step 4. Pure refactor inside _ctx.py — no new feature emits. Lays the shared utility for the three cognitive primitives next in line (Steps 5-7). * Command dataclass (frozen): start_ts, end_ts, first_token_hash. PII-safe by construction — only the first whitespace-delimited token of the command is retained, and only as a sha256 hash (decnet/profiler/behave_shell/_parse.py:hash_token). * _segment_commands walks input events char-by-char, splits on \r / \n, hashes the first token, drops the rest. * SessionContext gains commands, inter_cmd_iats, output_per_cmd. output_per_cmd[i] counts bytes between commands[i].end_ts and commands[i+1].start_ts — the natural pairing for Step 7 (feedback_loop_engagement). Tests: empty / unterminated streams, single command (CR + LF terminators), paste-with-newline, multi-command IAT pairing, output-byte counting between boundaries, blank-line skip, first-token-only PII discipline.	2026-05-03 07:50:55 -04:00
anti	6763fceb0b	feat(profiler/behave_shell): emit motor.paste_burst_rate BEHAVE-EXTRACTOR.md Phase A Step 3. Same paste-event ratio as motor.input_modality but coarser-bucketed: this is the habit signal (does the operator reach for paste at all?), where input_modality is the dominant-channel signal. * _features/motor.py:paste_burst_rate(ctx) emits one Observation per session in {none, occasional, habitual} with confidence 0.70 / 0.70 / 0.80. * Thresholds: PASTE_RATE_OCCASIONAL_MIN=0.10, PASTE_RATE_HABITUAL_MIN=0.50. Splits YOU-sim from LW/CLAUDE-FF/CLAUDE-CL — LLM-driven sessions paste habitually, real humans rarely paste. Tests: pure-typed → none; 1-paste-in-10 → occasional; paste-majority → habitual; output-only → no observation; habitual confidence > occasional confidence.	2026-05-03 07:49:03 -04:00
anti	879f5e731b	feat(profiler/behave_shell): emit motor.input_modality BEHAVE-EXTRACTOR.md Phase A Step 2. The first primitive — picked first because it has the highest discriminative value (HUMAN vs everyone) and the simplest implementation (paste-event ratio over total inputs). * _features/motor.py:input_modality(ctx) emits one Observation per session in {typed, pasted, mixed} with confidence 0.75 / 0.70. * _features/_emit.py centralises the make_observation helper so every feature module gets the same Window/source/evidence_ref boilerplate without copy-paste. * Thresholds inherited from the prototype's calibration history (MODALITY_PASTED_MIN=0.40, MODALITY_TYPED_MAX=0.05). * Zero-input session skips emission — registry doesn't admit "unknown" here. Tests: pure-typed → typed, pure-pasted → pasted, mixed → mixed, output-only session → no observation, full envelope round-trip.	2026-05-03 07:47:38 -04:00
anti	c9a81a23c2	feat(profiler/behave_shell): asciinema parser + paste-burst detection BEHAVE-EXTRACTOR.md Phase A Step 1. Lays the shared primitives that Steps 2-3 (motor.input_modality, motor.paste_burst_rate) will consume: * parse_shard_line / parse_shard turn a shard JSONL line/file into AsciinemaEvents, skipping headers and malformed records. * PasteBurst dataclass + _detect_paste_bursts group consecutive paste-class input events (len(d) >= 4 chars per the prototype's empirical floor) into contiguous bursts, splitting on IAT gaps larger than PASTE_BURST_MAX_IAT_S (200ms). * SessionContext now carries iats and paste_bursts derivations. * Threshold constants harvested from BEHAVE/prototype_extractors/shell/extract.py — calibrated against the five 2026-05-02 shards. Tests cover pure-typed, pure-pasted, mixed streams; close vs far paste events; typed events breaking a burst; PasteBurst immutability; and the JSON parser's junk handling.	2026-05-03 07:46:01 -04:00
anti	f8eae04e5d	feat(profiler/behave_shell): scaffold extract_session entry point BEHAVE-EXTRACTOR.md Phase A Step 0. Lays the package skeleton (__init__/extract/_parse/_ctx/_thresholds/_features) with empty FEATURES = (), so the worker plumbing in BEHAVE-INTEGRATION Phase 4 has a stable import path before any primitive lands. extract_session() builds a SessionContext once and fans the registered feature functions across it; at Step 0 that fan-out is empty and the function yields nothing. Step 1 (asciinema parser + paste-burst detector) and Step 2 (motor.input_modality) land next. Smoke suite asserts the empty contract: empty stream → no observations, single event → t_start == t_end, multi-event → events routed into input_events / output_events by kind, evidence_ref defaults to "session:<sid>" or honours an explicit override.	2026-05-03 07:42:09 -04:00
anti	a2a61b636e	feat(web): drop SessionProfile, wire observations into AttackerDetail (DEBT-050 / DEBT-036 closure) Destructive half of BEHAVE-INTEGRATION.md Phase 1. SessionProfile + its kd_* columns + the dialect ALTER TABLE migration helpers are deleted outright; pre-v1, the table shipped empty, no migration ceremony required (per the no-new-_migrate_-pre-v1 memory rule). DEBT-036 closes via DEBT-050 supersedure. AttackerDetail's ``observations`` field is wired to the new ``observations`` table and returns an empty list until the BEHAVE-SHELL extractor (DEBT-050 Phase 2) starts emitting. decnet/web/db/models/attackers.py — SessionProfile class deleted (~135 lines), KD_PAUSE_*/KD_START_OF_ACTION_IDLE_S module constants deleted, module docstring updated to point at the observations table. AttackerIdentity.kd_digraph_simhash is KEPT — it's the v2 federation centroid hook, not a SessionProfile field; docstring repointed to the BEHAVE primitive that will populate it. decnet/web/db/sqlmodel_repo/attackers/sessions.py — DELETED. SessionProfilesMixin dropped from the AttackersMixin MRO. decnet/web/db/repository.py — abstract upsert_session_profile + get_session_profile removed. decnet/web/db/sqlite/repository.py + mysql/repository.py — _migrate_session_profile_table helpers and their initialize() calls removed. mysql initialize() now goes attackers → column_types → admin (no session_profile step). decnet/web/db/models/__init__.py — SessionProfile re-export gone. decnet/web/db/models/attacker_intel.py — docstring cross-reference to SessionProfile.schema_version retargeted to AttackerIdentity. decnet/web/router/attackers/api_get_attacker_detail.py — adds ``observations: []`` to the response by calling ``repo.latest_observation_per_primitive(uuid)`` and projecting to a list sorted by primitive path. Empty until the extractor lands; shape matches BEHAVE-INTEGRATION.md §"AttackerDetail consumer". tests/profiler/test_session_profile.py — DELETED (56 lines). tests/db/test_base_repo.py — DummyRepo loses upsert_session_profile and get_session_profile overrides. tests/db/mysql/test_mysql_migration.py — initialize-call-order assertion updated; session_profile step removed from the expected sequence; docstring records why. tests/ttp/test_lifter_absence.py — docstring "no SessionProfile" → "no ObservationRow".	2026-05-03 07:33:37 -04:00
anti	0972325527	feat(web/db): observations table + repo + bus prefix (BEHAVE-INTEGRATION Phase 1) Additive Phase 1 of BEHAVE-INTEGRATION.md. Lays the storage layer the BEHAVE-SHELL extractor (DEBT-050) will write into. Nothing breaks; SessionProfile coexists for now and is dropped in the follow-up commit. decnet/web/db/models/observations.py — new ObservationRow SQLModel mirroring the BEHAVE Observation envelope field-for-field (core/decnet_behave_core/spec/envelope.py). ``id`` is a hex-string UUID (matching BEHAVE), not a typed UUID column. ``identity_ref`` is str \| None — written by the future attribution engine, NULL until then. ``attacker_uuid`` is the one DECNET-side denormalisation; FK'd to attackers.uuid for cheap AttackerDetail joins. ``evidence_ref`` is NOT NULL for DECNET emissions even though the upstream envelope makes it optional — the worker's "already profiled?" check keys on it. UniqueConstraint(evidence_ref, primitive) enforces idempotency at the schema level so re-running the extractor on the same shard+sid produces a DB-side conflict the upsert path resolves deterministically. Class is named ``ObservationRow`` (not ``Observation``) to avoid colliding with the BEHAVE Pydantic envelope at sites that import both. decnet/web/db/sqlmodel_repo/observations.py — ObservationsMixin. Three public methods backing the canonical queries from BEHAVE-INTEGRATION.md §"Storage": ``upsert_observation`` (idempotent on the natural key), ``latest_observation_per_primitive`` (per- primitive MAX(ts) subquery, portable across SQLite and MySQL — no DISTINCT ON), ``observations_time_series`` (asc-by-ts). Plus ``has_observations_for_evidence`` for the worker's session-already- profiled check. decnet/bus/topics.py — ATTACKER_OBSERVATION_PREFIX = "observation" constant + ``attacker_observation(primitive)`` builder. Full topic shape ``attacker.observation.<primitive>`` matches what BEHAVE's spec.event_adapter.event_topic_for produces upstream. Documentation + pattern matching only — bus auth is socket file perms (DEBT-029 §2), not topic-level. decnet/web/db/repository.py — abstract ``upsert_observation``, ``latest_observation_per_primitive``, ``observations_time_series`` on BaseRepository. tests/db/test_observations.py — 11 tests covering upsert round-trip, idempotency under the unique constraint, latest-per-primitive ordering across multiple sessions, time-series asc-ordering, empty- attacker contract, every BEHAVE ValueKind round-tripping through the JSON column, and the has_observations_for_evidence check. tests/db/test_base_repo.py — DummyRepo gains the three new abstract overrides so its coverage suite still instantiates.	2026-05-03 07:25:10 -04:00
anti	3f080f601d	feat(intel,ingester): mal_hash feed + observed_attachments table (DEBT-046) New MalHashProvider sibling ABC (decnet/intel/base.py) since SHA-256 is a different keyspace from IntelProvider's IPs. MalwareBazaarProvider mirrors FeodoProvider's bulk-feed shape: 24h refresh via _ensure_fresh / _refresh, in-memory set[str] of hex-lowercased hashes, set-membership lookup. Auth-keyed via DECNET_MALWAREBAZAAR_AUTH_KEY; absent key silent-no-ops the lane (single warning, no HTTP traffic). Per-hash observations persist to a new observed_attachments table. DECNET is a honeypot platform — every attachment hash an attacker delivers is intel, regardless of whether anyone classified it. Verdict is sticky: True never downgrades to False/None on subsequent observations. Out of scope: API surface, federation export, retention. Ingester _publish_email_received calls the provider for each attachment sha256, sets mal_hash_match on the bus payload (omitted entirely when the message had no attachments — keeps R0046's `is True` predicate silent on hash-less mail, matching pre-paydown behavior), and upserts the row regardless of provider availability.	2026-05-03 05:56:46 -04:00
anti	03beff3840	feat(orchestrator): authoritative failure-count badge endpoint (DEBT-042) New GET /api/v1/orchestrator/events/stats?since=1h&success=false&kind=... backed by repo.count_orchestrator_failures(since_ts, kind), which counts failed rows across both orchestrator_events and orchestrator_emails since the cutoff. Window parser accepts ^\d+[smhd]$, capped at 7d. Today only success=false is accepted on this surface so the endpoint isn't accidentally repurposed before the next consumer is properly designed. Orchestrator.tsx polls the endpoint on mount + every 30 s and renders the authoritative DB-derived count instead of deriving from the in-memory SSE buffer + one paginated page (which silently excluded failures older than the local window).	2026-05-03 05:26:45 -04:00
anti	6c6f97e840	feat(prober,correlation): attacker fingerprint rotation detection (DEBT-032) When the prober observes a NEW hash for an (attacker_uuid, port, probe_type) triple it has seen before — VPS rotation, SSH server rebuild, TLS cert swap — emit a derived attacker.fingerprint_rotated event carrying both old and new hash. Detection is a small library (decnet.correlation.fingerprint_rotation) called inline from the prober at each of the three emit sites (JARM/HASSH/TCPFP). No new daemon. New AttackerFingerprintState table holds per-triple last-hash state; Attacker.rotation_count and Attacker.last_rotation_at are stamped on every diff. Library is sync, fully unit-tested via injected publish_fn / syslog_fn callbacks.	2026-05-03 05:12:51 -04:00
anti	b3a96a045f	feat(mail): default email_seed → \$PROJROOT/bait/ when unset When service_cfg["email_seed"] is absent, compose_fragment now falls back to $PROJROOT/bait/ if that directory exists on the host. Lets operators drop a deployment-wide bait corpus into one place without threading email_seed through every decky's config. Missing dir keeps old no-op behavior.	2026-05-03 04:25:24 -04:00
anti	b88d67794d	feat(mail): operator-tunable IMAP/POP3 email seed (DEBT-026) IMAP_EMAIL_SEED / POP3_EMAIL_SEED accept a directory (rglob .eml + .json) or a single .json/.eml. Loaded entries CONCATENATE with the hardcoded _BAIT_EMAILS — additive to the realism-engine emailgen output rather than replacing it. JSON dicts require from_addr / to_addr / subject / body; bare bodies are wrapped into RFC 5322 on load. compose_fragment reads service_cfg["email_seed"] and bind-mounts the host path read-only at /var/spool/decnet-emails/seed.	2026-05-03 02:47:06 -04:00
anti	79674026dd	feat(cli): allow `decnet ttp` on agents (DEBT-047) The TTP-tagging worker is now safe to run on agent hosts: EmailLifter disk-reaches body-aware predicates from the local artifacts tree (DEBT-035 unblocked filesystem access; DEBT-047 added the helper). Drop `ttp` from MASTER_ONLY_COMMANDS in cli/gating.py and remove the defence-in-depth `_require_master_mode("ttp")` call in cli/ttp.py. `ttp-backfill` walks the master DB and stays master-only.	2026-05-02 20:07:03 -04:00
anti	e972d870de	feat(ttp): EmailLifter disk-reach for body-aware predicates (DEBT-047) R0047 (BEC) and the encoded-payload predicate substring-match against the email body. Shipping raw body text on the abstracted service bus is the wrong privacy stance — the bus transport may swap from UNIX socket to networked at any time, and "loopback today" is not a license to put PII on the wire. EmailLifter now opens the .eml lazily from /var/lib/decnet/artifacts/{decky_id}/smtp/{stored_as} when a body-aware predicate runs and parses the body in-process via stdlib email + policy.default. The decoded body is memoized into the payload dict so multiple body-aware predicates on the same event open the file once. Bus envelope only carries the artifact pointer (decky_id + stored_as); raw body bytes never cross the host disk boundary on the agent → master hop. Filesystem access on agents is unblocked by DEBT-035 (setgid + group-readable artifacts root, paid 2026-05-02). The legacy inline body_text path is preserved — when the producer ships body_text on the bus the helper short-circuits without opening the file.	2026-05-02 20:05:54 -04:00
anti	7036a86e76	refactor(artifacts): extract resolve_artifact_path to shared module Move artifact path validation + symlink-escape check out of the admin-gated download endpoint into decnet/artifacts/paths.py so the TTP EmailLifter can disk-reach .eml files at tag-time without duplicating regex/root logic (DEBT-047). The router now catches ArtifactPathError and re-raises HTTPException(400); behavior is unchanged.	2026-05-02 20:02:47 -04:00
anti	cdbb3d3571	fix(ssh,telnet): move PROMPT_COMMAND out of /root/.bashrc + pin readonly ANTI flagged two regressions in the existing command-event capture: 1. Tell: PROMPT_COMMAND lived in /root/.bashrc, the FIRST file an attacker greps after landing root. The logger invocation sitting there is plain-text honeypot signage. 2. Bypass: even when missed, `export PROMPT_COMMAND=""` silently disables capture. ANTI personally bypasses this on engagements. Reshape: * Move the assignment to /etc/environment — read by pam_env at session open (sshd via /etc/pam.d/sshd, telnet via /etc/pam.d/login), before any shell rc file fires. Far less obvious than .bashrc; a casual `cat .bashrc` no longer surfaces the capture. * Define the helper as a function `__bash_history_sync` in /etc/bash.bashrc (system-wide bashrc, sourced by every interactive bash). Function name reads as generic bash housekeeping; no DECNET branding in the symbol. * Pin both the function and PROMPT_COMMAND readonly so `export PROMPT_COMMAND=""` fails with "readonly variable" instead of silently winning. Mitigation, not airtight — `bash --norc` still bypasses — but the passive `export` bypass is closed. The actual `logger --rfc5424 --msgid command ... CMD ...` invocation is preserved exactly; only its location and the readonly guard change. R0001–R0030 (command-rule pack) consume the same syslog shape as before. Three new tests assert: the value lands in /etc/environment, the function body lives in /etc/bash.bashrc, no PROMPT_COMMAND line remains in /root/.bashrc, and `readonly PROMPT_COMMAND` / `readonly -f __bash_history_sync` are both present. Mirror assertions added on the Telnet Dockerfile via test_config_schema.py.	2026-05-02 19:50:24 -04:00
anti	3e9c4c29b9	feat(ssh,telnet): add non-root user account for privesc + enum lure Real Linux deployments (especially Ubuntu cloud images) ship a non- root admin user; honeypots that only accept root logins are a tell. Add a second account on both SSH and Telnet decoys, configurable via service_cfg keys `user` / `user_password`, defaulting to `ubuntu` / `admin` so the lure is live on every fresh deploy. * `decnet/services/{ssh,telnet}.py` — two new ServiceConfigFields (`user` string, `user_password` secret) and matching env vars (`SSH_USER` / `SSH_USER_PASSWORD`, mirror for telnet) propagated via the compose fragment. * `decnet/templates/ssh/entrypoint.sh` — runtime `useradd -m -s /usr/libexec/login-session -G sudo "$SSH_USER"` so the new user inherits the same sessrec pty-recording shell as root and lands in the sudo group. Privesc attempts (`sudo`) flow through the existing sudo-log capture; network-enum from the user's shell rides the recorded transcript. * `decnet/templates/telnet/entrypoint.sh` — same useradd pattern (no sudo group — busybox+login telnet image has no sudo package; privesc rides `su -` which itself flows through the existing PAM auth-helper at /etc/pam.d/login). * New tests for default + custom user / password + independence from root password. Updated the schema-keys assertion to match the four-field shape. The new account is ALSO the natural home for the body-aware predicates that were previously gated on root-only sessions — attackers who land on `ubuntu@host` and run network-recon / privesc commands now generate the same structured TTP-rule events as root sessions did, captured via the same auth-helper + sessrec + sudo-log pipes.	2026-05-02 19:48:03 -04:00
anti	b27332169d	feat(init): create /var/lib/decnet/artifacts with setgid + group-write DEBT-035 step 2. Today the artifacts subtree is auto-created by Docker as root when a decoy container's bind-mount fires for the first time. The resulting permissions are root:root 0o755 — the API process (running as the decnet user) hits PermissionError trying to read transcripts written by the container, and the soft-fail 404 path gets exercised on every fresh deploy. Add `/var/lib/decnet/artifacts` to init's dirs list with mode 0o2775: * 0o2000 — setgid bit. New files inherit the directory's group (decnet), regardless of which uid created them. This is the load- bearing bit for cross-container reads. * 0o0775 — owner+group rwx, world rx. Group-write lets the API process and the local TTP worker read each other's outputs without a manual chown. `_ensure_dir` already respects the full mode word via `os.chmod`, no helper change needed. Test asserts the resulting directory carries exactly 0o2775 after a fresh `decnet init --prefix`. Defence-in-depth: this works even if the per-decoy compose `user:` directive (next commit) misses a template — files still land in the decnet group.	2026-05-02 19:35:20 -04:00
anti	39a298f685	feat(init): persist DECNET-service api-user/api-group to decnet.ini DEBT-035 step 1. The composer needs to know which uid/gid to inject into each compose fragment's `user:` directive at deploy time. Today the resolved `--user` / `--group` values reach systemd unit rendering (init.py:349–354) but are not persisted anywhere the composer can read them. Persist as names (not numeric ids) under `[decnet] api-user` / `api-group` in the rendered decnet.ini placeholder. Resolution to uid/gid happens at deploy time on whichever host runs the deploy, via `pwd.getpwnam(...)` / `grp.getgrnam(...)` — so the same user name can have different uids on master vs agents (heterogeneous /etc/passwd) without breaking artifact ownership. The existing config_ini auto-translates kebab→DECNET_API_USER / DECNET_API_GROUP at load time; no domain-map changes needed. Two new tests: one asserting the rendered ini carries the `api-user` / `api-group` keys for the values passed to `--user` / `--group`; one round-tripping through `load_ini_config` to confirm the env vars land in `os.environ` for the composer to pick up.	2026-05-02 19:33:53 -04:00
anti	c714941069	feat(bus): project EmailLifter heavyweight fields onto email.received The decky's Layer-2 extension (commit `291b78c1`) emits body_simhash / body_base64_bytes / html_smuggling on the message_stored log and adds macro_indicator / encrypted booleans to each attachments_json manifest entry. Lift them all onto the email.received bus payload: * body_simhash — passes through as-is (16 hex chars or "") * body_base64_bytes — coerced to int (0 on absent / malformed) * attachment_macros / attachment_password_protected — OR-reduced across the per-attachment manifest booleans; matches R0046's matched_trigger semantics where a single positive lane fires the rule * html_smuggling — coerced bool from the decky's 0/1 int Pre-Layer-2 message_stored events (older deckies, malformed log rows) project to safe defaults: empty simhash, zero base64-bytes, all booleans False — the EmailLifter then stays silent, never fires a false positive on missing data. R0042 (mass-phish) / R0046 macro / R0046 password / R0046 smuggling / R0048 (encoded payload) all fire end-to-end after this commit. R0046 mal_hash_match and R0047 BEC remain deferred per their respective DEBT entries (filed in the next commit).	2026-05-02 19:10:30 -04:00
anti	291b78c1d0	feat(smtp): extract body_simhash + base64-bytes + html-smuggling + per-attachment macro/encrypted Heavyweight Layer-2 extractors land alongside the cheap projections shipped in commit `e9324aca`, so the EmailLifter R0042 / R0046 (macros / password / smuggling lanes) / R0048 fire from the bus payload without the lifter having to reach back to disk. Extractors: * body_simhash — inlined 64-bit Charikar simhash (md5-keyed, frequency-weighted) over word tokens of the union of text/* body parts. Inlined rather than pulling the `simhash` PyPI dep, which transitively brings numpy ~50 MB into a slim decky container; the algorithm is ~15 lines and identical in extraction quality. * body_base64_bytes — largest decoded base64 chunk's byte count, scanning text body parts with the same `_BASE64_RE` the lifter's `_p_encoded_payload` fallback uses. R0048 fires from this scalar alone; the lifter's body_text fallback becomes dead in normal operation. * attachment_macro_indicator — stdlib zipfile sniff for `vbaProject.bin` inside OOXML containers. Catches modern .docm / .xlsm / .pptm and macro-injected .docx; legacy .xls (CFBF) is a follow-up. * attachment_encrypted — flag_bits & 0x01 on any ZIP / OOXML entry's central directory; magic-byte match for 7z / RAR / CFBF (encrypted Office wrap). * html_smuggling — structural lxml parse first: fires when an `<a download>` element coexists with a `<script>` referencing `Blob` / `Uint8Array` / `URL.createObjectURL`. Regex pair-check fallback on lxml parse failure (real-world phish HTML is often malformed). Cuts the FP rate that pure-regex would produce on legitimate "click to download" links. Add `python3-lxml` (~5 MB Debian package, C-extension, no transitive Python deps) to the SMTP decky's Dockerfile. simhash stays inline. Per the dependency rule: lxml earns its weight by cutting R0046's OR-combined FP rate; a heavier macro-detection lib (oletools ~5 MB pure-python with msoffcrypto) would not measurably improve the boolean signal we need, so stdlib stays for that lane.	2026-05-02 19:08:37 -04:00
anti	fb85762703	feat(bus): publish email.received from ingester after SMTP artifact persist Wires the EmailLifter (R0041–R0048) producer that DEBT.md item #3 deferred. After the existing add_bounty() call in _extract_bounty (line 615), call _publish_email_received() which: * resolves the attacker_uuid via repo.get_attacker_uuid_by_ip; drops the publish if unresolved (the TTP worker can't anchor orphan events) * projects the message_stored fields onto the EmailLifter wire contract: from_domain / mail_from_domain / return_path_domain parsed via _domain_of, rcpt_count + rcpt_domains via _rcpt_projection, attachment_sha256s + attachment_extensions derived from the existing attachments_json manifest, urls from urls_json, dkim_signed/spf_pass coerced from 0/1 ints to bool * mirrors _publish_probe_pending's bus-per-call pattern and swallows all exceptions (the bus is the notification layer, not the source of truth) Fires for both relay and non-relay SMTP services. R0041 / R0043 / R0044 / R0045 are now live end-to-end; R0046 partial (extension lane). Heavyweight predicates (R0042 simhash, R0046-deep, R0047 / R0048 body_text) stay deferred per the EmailLifter heavyweight DEBT entry.	2026-05-02 18:39:13 -04:00
anti	e9324acac7	feat(smtp): emit X-Mailer / Return-Path / dkim+spf / URLs on message_stored The EmailLifter (R0041–R0048) keys on header-derived signals that the v0 _summarize_message did not extract. Add cheap Layer 2 projections inside the existing single-pass parse: * return_path / x_mailer — direct header reads, decoded RFC 2047 * dkim_signed / spf_pass — booleans derived from any Authentication-Results header (multiple lines tolerated; positive verdict on any line wins) * urls — http(s) URLs lifted from text/* body parts via a tight regex, deduplicated first-seen-wins, capped at 64 in the wire payload to bound the syslog SD value Heavyweight extraction (body simhash, office-macro detection, HTML-smuggling, password-protected archives, mal-hash-match, body_text projection) stays deferred per the EmailLifter heavyweight DEBT entry — those rules need privacy / extractor decisions before they ship.	2026-05-02 18:37:11 -04:00
anti	75ff0ede1f	fix(ttp): correct intel_lifter mappings + repoint ThreatFox to threat_type Three bug classes uncovered by the 2026-05-02 ship-time audit: * AbuseIPDB code/name mismatch in v1: cat 10 was treated as DDoS (it's Web Spam — DDoS is cat 4, intentionally unmapped per A.10) and cat 17 as VPN IP (it's Spoofing — VPN IP is cat 13). Both typos mirrored in code AND the design doc Appendix A.10. Code now matches the AbuseIPDB taxonomy exactly; cat 17 retargets to T1566 (email-spoofing as a phishing precursor), and cats 7 (Phishing) and 16 (SQL Injection) pick up T1566 / T1190 emissions that v1 didn't cover. * ThreatFox dispatch keyed on `ioc_type` in v1, but `ioc_type` is the indicator format (url / domain / hash variants) and carries no ATT&CK signal. The canonical taxonomy field per ThreatFox's API is `threat_type` (botnet_cc / payload_delivery / payload / cc_skimming). Repoint dispatch through the new `threatfox_threat_types` payload field; `ioc_type` rides as evidence only. Also adds the missing cc_skimming -> T1056 (Input Capture) mapping and registers T1056 in attack_catalog.py. * GreyNoise bare-malicious lane: a `classification == "malicious"` row with no recognised tag used to emit nothing. Now lights T1071 at a half multiplier, suppressed when a tag already fires T1071 to avoid double-stamping at conflicting confidence levels.	2026-05-02 18:08:48 -04:00
anti	a31ad82880	feat(intel): project per-provider taxonomy into attacker.intel.enriched payload The TTP worker forwards the bus payload verbatim to the IntelLifter as TaggerEvent.payload. The pre-audit publish payload only carried {attacker_uuid, attacker_ip, aggregate_verdict, providers}, so even with the new AttackerIntel taxonomy columns populated the lifter still saw nothing. Lift the relevant fields (categories / tags / threat_types / malware family / score / classification) into the bus event and decode JSON-string list columns back to native lists at the boundary.	2026-05-02 18:08:29 -04:00
anti	999d3494b4	feat(intel): persist per-provider taxonomy on AttackerIntel for TTP dispatch The 2026-05-02 ship-time audit of the R0054-R0058 intel rule pack found that AbuseIPDB / GreyNoise / ThreatFox stored only the aggregate verdict (score / classification / listed-bool) plus the raw response blob. The TTP IntelLifter expects per-provider taxonomy fields (categories, tags, threat_types) that were never populated, so R0054 / R0055 / R0057 emitted zero tags in production despite passing unit tests. Add typed columns: abuseipdb_categories, greynoise_tags, greynoise_name, feodo_malware_family, threatfox_threat_types, threatfox_ioc_types, threatfox_malware_families. Each provider now parses the relevant taxonomy out of the upstream response and writes it through column_updates. JSON-list columns ride as TEXT with default "[]" to keep the SQLite/MySQL backend split honest, deserialised back to native lists by the repo on read.	2026-05-02 18:07:57 -04:00
anti	d1c4a48963	feat(ttp): split bash CMD evidence into structured uid/user/src/pwd/cmd rows The inspector was dumping the whole `CMD uid=0 user=root src=… pwd=… cmd=nmap -p- 192.168.1.0/24` syslog body into a single ``command_text`` blob. ANTI: "I'd like to separate the fields." Done — three layers work together: 1. Collector session aggregator: new `_parse_cmd_msg` splits the bash PROMPT_COMMAND msg into `{uid, user, src, pwd, command}`. The session-ended envelope's per-command dict now carries the structured fields, with `command_text` set to just the cmd= value (preserving embedded whitespace — `nmap -p- 1.2.3.0/24` etc.). 2. Rule engine: per-source_kind auxiliary evidence list (`_AUX_EVIDENCE_FIELDS`). For `command` events the engine automatically promotes uid/user/src/pwd into the persisted `evidence` dict on top of the rule's explicit `evidence_fields`. Engine-controlled, not per-rule — adding a new aux field is one line here, not a 30-rule YAML sweep, and rule authors can't accidentally drop it. 3. TTPInspector frontend: evidence renders as a structured `kvs` grid (UID / USER / SRC / PWD / CMD rows) instead of pretty-printed JSON. Primary-order list keeps shell fields at the top; everything else falls below alphabetically so unfamiliar evidence shapes still surface predictably. Tests: - session_aggregator pins the structured-fields emit (uid/user/src/ pwd/command_text without "CMD" prefix, embedded whitespace preserved). - rule_engine_tagger pins the aux-field auto-promotion + the no-`None`-leakage path when payload doesn't carry an aux key.	2026-05-02 03:20:53 -04:00
anti	84699f89da	feat(ttp): show canonical ATT&CK technique names in the TTPs UI "T1595" alone is opaque; "T1595 — Active Scanning" tells you the story at a glance. The names come from a backend-side static catalogue pinned to the same ATT&CK release as the rule engine (_ATTACK_RELEASE = "v15.1") — names are the canonical MITRE labels, not author-supplied strings on rules, so a rule author can't typo a name and the entire fleet sees the typo. - New `decnet/ttp/attack_catalog.py` with `TECHNIQUE_NAMES` covering every technique_id + sub_technique_id emitted by `rules/ttp/` (R0001..R0058 → 69 IDs in the v0 pack). - `IdentityTechniqueRow` / `TechniqueRollupRow` / `CampaignTechniqueRow` / `TTPTagDetailRow` gain optional `technique_name` / `sub_technique_name` fields. Repo + router populate them from the catalogue at row-construction time. None when an ID isn't in the catalogue — UI falls back to the bare ID. - Coverage test (`tests/ttp/test_attack_catalog.py`) walks every YAML rule and asserts every emitted ID has a catalogue entry, so a future rule author who forgets to update the catalogue gets a loud failure rather than a silent UI fallback. Frontend: - `TTPsObservedSection` shows "T1595.002 — Active Scanning: Vulnerability Scanning" instead of just the ID, with overflow ellipsis + tooltip for narrow viewports. Inspector header / TECHNIQUE row also surface the names.	2026-05-02 03:10:07 -04:00

1 2 3 4 5

206 Commits