feat(intel): persist per-provider taxonomy on AttackerIntel for TTP dispatch

The 2026-05-02 ship-time audit of the R0054-R0058 intel rule pack found that AbuseIPDB / GreyNoise / ThreatFox stored only the aggregate verdict (score / classification / listed-bool) plus the raw response blob. The TTP IntelLifter expects per-provider taxonomy fields (categories, tags, threat_types) that were never populated, so R0054 / R0055 / R0057 emitted zero tags in production despite passing unit tests. Add typed columns: abuseipdb_categories, greynoise_tags, greynoise_name, feodo_malware_family, threatfox_threat_types, threatfox_ioc_types, threatfox_malware_families. Each provider now parses the relevant taxonomy out of the upstream response and writes it through column_updates. JSON-list columns ride as TEXT with default "[]" to keep the SQLite/MySQL backend split honest, deserialised back to native lists by the repo on read.
2026-05-02 18:07:57 -04:00
parent d1c4a48963
commit 999d3494b4
10 changed files with 272 additions and 1 deletions
--- a/decnet/intel/threatfox.py
+++ b/decnet/intel/threatfox.py
@@ -71,6 +71,9 @@ class ThreatFoxProvider(IntelProvider):
                verdict=None,  # absence is not a benign signal
                column_updates={
                    "threatfox_listed": False,
+                    "threatfox_threat_types": "[]",
+                    "threatfox_ioc_types": "[]",
+                    "threatfox_malware_families": "[]",
                    "threatfox_raw": "{}",
                    "threatfox_queried_at": datetime.now(timezone.utc),
                },
@@ -83,11 +86,36 @@ class ThreatFoxProvider(IntelProvider):

        data = payload.get("data") or []
        listed = bool(data)
+        # Each match in ``data`` carries threat_type / ioc_type / malware
+        # (canonical family). The IntelLifter dispatches ATT&CK techniques
+        # off ``threat_type`` (botnet_cc / payload_delivery / payload /
+        # cc_skimming); the other two columns are evidence and SIEM
+        # context. Sets are flattened across matches and serialised
+        # sorted for determinism.
+        threat_types: set[str] = set()
+        ioc_types: set[str] = set()
+        families: set[str] = set()
+        if isinstance(data, list):
+            for entry in data:
+                if not isinstance(entry, dict):
+                    continue
+                tt = entry.get("threat_type")
+                if isinstance(tt, str) and tt:
+                    threat_types.add(tt)
+                it = entry.get("ioc_type")
+                if isinstance(it, str) and it:
+                    ioc_types.add(it)
+                family = entry.get("malware") or entry.get("malware_printable")
+                if isinstance(family, str) and family:
+                    families.add(family)
        return IntelResult(
            provider=self.name,
            verdict="malicious" if listed else None,
            column_updates={
                "threatfox_listed": listed,
+                "threatfox_threat_types": json.dumps(sorted(threat_types)),
+                "threatfox_ioc_types": json.dumps(sorted(ioc_types)),
+                "threatfox_malware_families": json.dumps(sorted(families)),
                "threatfox_raw": json.dumps(data),
                "threatfox_queried_at": datetime.now(timezone.utc),
            },