From d5b3efe30e7f9c0b7469696a4ba01296f4d814e0 Mon Sep 17 00:00:00 2001 From: anti Date: Sun, 26 Apr 2026 07:15:55 -0400 Subject: [PATCH] docs(wiki): Identity-Resolution page + identity.* topics in Service-Bus Documents the observation/identity/campaign three-level hierarchy, the read-only API surface, the deferred clusterer worker, and how to test the substrate. Companion to development/IDENTITY_RESOLUTION.md in the main repo. Service-Bus.md gains the three identity.* topic rows (reserved for the future clusterer); sidebar links Identity-Resolution under Developer docs alongside Campaign-Clustering. --- Identity-Resolution.md | 169 +++++++++++++++++++++++++++++++++++++++++ Service-Bus.md | 3 + _Sidebar.md | 1 + 3 files changed, 173 insertions(+) create mode 100644 Identity-Resolution.md diff --git a/Identity-Resolution.md b/Identity-Resolution.md new file mode 100644 index 0000000..ecf8080 --- /dev/null +++ b/Identity-Resolution.md @@ -0,0 +1,169 @@ +# Identity Resolution + +Pre-implementation feature. The clusterer worker that populates these +rows is a separate downstream effort; the substrate (schema, API, +frontend, bus topics) ships first so downstream work and the campaign +clustering fixtures can target a stable shape. + +The full design lives in the repo at +[`development/IDENTITY_RESOLUTION.md`](https://github.com/dec-net/decnet/blob/main/development/IDENTITY_RESOLUTION.md). +This page documents the current substrate. + +--- + +## The three-level hierarchy + +DECNET's previous data model conflated two distinct concepts in one +table: + +| Level | Unit | How it's created | Stability | +|---|---|---|---| +| `attackers` | per-IP — "we saw activity from X starting at T" | profiler ingest, dumb / synchronous | mutable — IPs come and go | +| `attacker_identities` | per-actor — "these N observations are the same hands" | clusterer, async, on stable fingerprints (JA3, HASSH, payload, C2, kd_digraph_simhash) | semi-stable — tightens as evidence accumulates | +| `campaigns` | per-operation — "these M identities are coordinated" | clusterer, async, on shared infra / tooling / phase handoff | derived from identities | + +`attackers` keeps its name and user-facing meaning ("the attacker the +operator clicked"). It plays the role of **observation** under the new +model — one row per source IP. The dedup'd "same hands" view lives +alongside it in `attacker_identities`. + +The clusterer (per-IP observation → identity → campaign) is the same +problem at different scales: clustering on increasingly meta signals. +See [Campaign-Clustering](Campaign-Clustering) for the campaign layer. + +--- + +## Schema + +### `AttackerIdentity` (new in `decnet/web/db/models/attackers.py`) + +| Column | Type | Notes | +|---|---|---| +| `uuid` | TEXT PK | uuid4(); not fingerprint-derived | +| `schema_version` | INT, default 1 | Federation gossip compat from day one | +| `campaign_id` | TEXT FK nullable | Set by the campaign clusterer | +| `first_seen_at` / `last_seen_at` / `created_at` / `updated_at` | TIMESTAMP | | +| `confidence` | REAL nullable | Clusterer's identity-cohesion score | +| `observation_count` | INT default 0 | Denormalized; live count via API | +| `ja3_hashes` / `hassh_hashes` / `payload_simhashes` / `c2_endpoints` | JSON-in-TEXT nullable | Multi-tool actors get multiple values | +| `kd_digraph_simhash` | BINARY(8) nullable | V2 keystroke-dynamics hook | +| `merged_into_uuid` | TEXT self-FK nullable | Soft-merge audit trail | +| `notes` | TEXT nullable | Operator-editable annotations | + +### `attackers.identity_id` (new column) + +Nullable indexed FK to `attacker_identities.uuid`. NULL until the +clusterer resolves an identity for the row. Ingestion paths +(profiler, correlator) keep upserting `attackers` rows without +touching `identity_id`. + +--- + +## API + +All endpoints are read-only and auth-gated identically to +`/api/v1/attackers/*`. + +| Method | Path | Returns | +|---|---|---| +| GET | `/api/v1/identities` | Paginated list, newest-updated first; excludes merged-out rows | +| GET | `/api/v1/identities/{uuid}` | Detail row + `observation_count_live`. Transparently follows `merged_into_uuid` to surface the canonical winner | +| GET | `/api/v1/identities/{uuid}/observations` | Paginated `Attacker` rows FK'd to the (resolved) identity uuid | + +Empty result / 404 is the universal response while the clusterer hasn't +ramped up yet. + +```bash +# Empty list while the clusterer hasn't run +curl -H "Authorization: Bearer $TOKEN" \ + http://localhost:8000/api/v1/identities +# {"total": 0, "limit": 50, "offset": 0, "data": []} + +# Empty 404 for any uuid +curl -H "Authorization: Bearer $TOKEN" \ + http://localhost:8000/api/v1/identities/00000000-0000-0000-0000-000000000000 +# {"detail": "Identity not found"} +``` + +--- + +## Bus topics + +Constants ship in `decnet.bus.topics`; **no publishers exist yet**. +Subscribers can register against `identity.>` from day one and start +receiving events the instant the clusterer comes online. + +| Topic | Payload | When | +|---|---|---| +| `identity.formed` | `{identity_uuid, observation_uuids: [...], confidence, first_seen_at}` | Clusterer creates a new identity from one or more observations | +| `identity.observation.linked` | `{identity_uuid, observation_uuid, confidence_after}` | Observation attached / re-attached to an identity | +| `identity.merged` | `{winner_uuid, loser_uuid, observation_uuids: [...], confidence_after}` | Two identities collapsed. Loser keeps its row with `merged_into_uuid` set; subscribers re-key cached references to the winner | + +Built via `topics.identity(IDENTITY_FORMED)` etc. See +[Service-Bus](Service-Bus) for the full topic table. + +`identity.campaign.assigned` is deferred and will land alongside the +campaign clusterer. + +--- + +## Frontend + +`decnet_web/src/components/IdentityDetail.tsx` — `/identities/:id` + +- Header with uuid, optional `CAMPAIGN · ` badge if assigned, + optional `MERGED INTO ` link (clicks navigate to the winner). +- Stats row: live observation count, distinct JA3, HASSH, payload + SimHashes, C2 endpoints. +- Confidence + schema version (only rendered if populated). +- Fingerprint detail tag lists for JA3, HASSH, C2 endpoints. +- Observations table (linked rows back to AttackerDetail). +- Optional analyst-notes panel. + +`AttackerDetail` gains a conditional `IDENTITY · ` badge in +the header when `identity_id` is non-null. Click → `/identities/`. +Zero behavior change while `identity_id` is uniformly NULL. + +--- + +## What hasn't been built yet + +- **Clusterer worker.** Reads observations, computes fingerprint + similarity (Hamming on simhashes, Jaccard / weighted edges on + hash sets), runs connected-components, writes identities, publishes + bus events. Designed in + [Campaign-Clustering](Campaign-Clustering) §4 and the in-repo + `CAMPAIGN_CLUSTERING.md`. +- **Identity-level intel** (`attacker_identity_intel`). Aggregate + reputation, threat-actor naming from MISP/CTI, MITRE ATT&CK tags. + Different lifecycle than the IP-scoped `attacker_intel` (DEBT-041); + separate table, separate enricher. The current API aggregates + observation intel on read in the meantime. +- **`SessionProfile.identity_id` FK.** Open question for V2 keystroke + dynamics. Currently sessions FK to `Log`, not `Attacker` / identity. +- **Webhook payload `identity_id` enrichment.** Adds opportunistically + once identities are populated. + +--- + +## Testing + +```bash +source .311/bin/activate + +# Schema invariants (table exists, FK targets, nullable columns, +# constraint blocks orphans, schema_version defaults to 1). +pytest tests/db/test_identity_schema.py -v + +# API surface against the empty table. +pytest tests/web/test_api_identities.py -v + +# Topic constants and builder. +pytest tests/bus/test_topics.py -v -k identity +``` + +--- + +See also: [Campaign-Clustering](Campaign-Clustering) (the next layer +up), [Service-Bus](Service-Bus) (topic table), +[Module-Reference-Web](Module-Reference-Web). diff --git a/Service-Bus.md b/Service-Bus.md index 24782e9..17cf426 100644 --- a/Service-Bus.md +++ b/Service-Bus.md @@ -151,6 +151,9 @@ Current topic families: | `attacker.observed` | Correlator | first sighting; consumed by `decnet enrich` as a wake signal | | `attacker.scored` | Profiler | post-enrichment score update; also wakes `decnet enrich` | | `attacker.intel.enriched` | `decnet enrich` | `{attacker_ip, aggregate_verdict, providers}` after a threat-intel pass; webhook → SIEM | +| `identity.formed` | _reserved (clusterer)_ | `{identity_uuid, observation_uuids: [...], confidence, first_seen_at}` — clusterer creates a new identity from one or more observations | +| `identity.observation.linked` | _reserved (clusterer)_ | `{identity_uuid, observation_uuid, confidence_after}` — observation attached / re-attached to an identity | +| `identity.merged` | _reserved (clusterer)_ | `{winner_uuid, loser_uuid, observation_uuids: [...], confidence_after}` — two identities collapsed; subscribers re-key cached references to the winner | | `system.log` | _reserved_ | — | | `system.bus.health` | Bus worker heartbeat | `{ts, uptime_s}` | diff --git a/_Sidebar.md b/_Sidebar.md index cc18462..1facde3 100644 --- a/_Sidebar.md +++ b/_Sidebar.md @@ -47,6 +47,7 @@ - [PKI-and-mTLS](PKI-and-mTLS) - [Testing-and-CI](Testing-and-CI) - [Campaign-Clustering](Campaign-Clustering) +- [Identity-Resolution](Identity-Resolution) - [Performance-Story](Performance-Story) - [Tracing-and-Profiling](Tracing-and-Profiling)