feat(intel): persist per-provider taxonomy on AttackerIntel for TTP dispatch

The 2026-05-02 ship-time audit of the R0054-R0058 intel rule pack found that AbuseIPDB / GreyNoise / ThreatFox stored only the aggregate verdict (score / classification / listed-bool) plus the raw response blob. The TTP IntelLifter expects per-provider taxonomy fields (categories, tags, threat_types) that were never populated, so R0054 / R0055 / R0057 emitted zero tags in production despite passing unit tests. Add typed columns: abuseipdb_categories, greynoise_tags, greynoise_name, feodo_malware_family, threatfox_threat_types, threatfox_ioc_types, threatfox_malware_families. Each provider now parses the relevant taxonomy out of the upstream response and writes it through column_updates. JSON-list columns ride as TEXT with default "[]" to keep the SQLite/MySQL backend split honest, deserialised back to native lists by the repo on read.
2026-05-02 18:07:57 -04:00
parent d1c4a48963
commit 999d3494b4
10 changed files with 272 additions and 1 deletions
--- a/decnet/intel/abuseipdb.py
+++ b/decnet/intel/abuseipdb.py
@@ -93,11 +93,24 @@ class AbuseIPDBProvider(IntelProvider):
        data = payload.get("data") or {}
        score = int(data.get("abuseConfidenceScore") or 0)
        verdict = _score_to_verdict(score)
+        # AbuseIPDB returns ``data.reports[*].categories`` — a list of
+        # int codes per report. Flatten the union across all recent
+        # reports so the IntelLifter sees the full activity profile,
+        # not just the most-recent report's categories. Sorted for
+        # determinism (matters for tests + for the bus payload diff).
+        categories: set[int] = set()
+        for report in data.get("reports") or []:
+            if not isinstance(report, dict):
+                continue
+            for cat in report.get("categories") or []:
+                if isinstance(cat, int):
+                    categories.add(cat)
        return IntelResult(
            provider=self.name,
            verdict=verdict,
            column_updates={
                "abuseipdb_score": score,
+                "abuseipdb_categories": json.dumps(sorted(categories)),
                "abuseipdb_raw": json.dumps(data),
                "abuseipdb_queried_at": datetime.now(timezone.utc),
            },