docs(wiki): Identity-Resolution page + identity.* topics in Service-Bus

Documents the observation/identity/campaign three-level hierarchy, the
read-only API surface, the deferred clusterer worker, and how to test
the substrate. Companion to development/IDENTITY_RESOLUTION.md in the
main repo.

Service-Bus.md gains the three identity.* topic rows (reserved for the
future clusterer); sidebar links Identity-Resolution under Developer
docs alongside Campaign-Clustering.
2026-04-26 07:15:55 -04:00
parent ba1862f380
commit d5b3efe30e
3 changed files with 173 additions and 0 deletions

169
Identity-Resolution.md Normal file

@@ -0,0 +1,169 @@
# Identity Resolution
Pre-implementation feature. The clusterer worker that populates these
rows is a separate downstream effort; the substrate (schema, API,
frontend, bus topics) ships first so downstream work and the campaign
clustering fixtures can target a stable shape.
The full design lives in the repo at
[`development/IDENTITY_RESOLUTION.md`](https://github.com/dec-net/decnet/blob/main/development/IDENTITY_RESOLUTION.md).
This page documents the current substrate.
---
## The three-level hierarchy
DECNET's previous data model conflated two distinct concepts in one
table:
| Level | Unit | How it's created | Stability |
|---|---|---|---|
| `attackers` | per-IP — "we saw activity from X starting at T" | profiler ingest, dumb / synchronous | mutable — IPs come and go |
| `attacker_identities` | per-actor — "these N observations are the same hands" | clusterer, async, on stable fingerprints (JA3, HASSH, payload, C2, kd_digraph_simhash) | semi-stable — tightens as evidence accumulates |
| `campaigns` | per-operation — "these M identities are coordinated" | clusterer, async, on shared infra / tooling / phase handoff | derived from identities |
`attackers` keeps its name and user-facing meaning ("the attacker the
operator clicked"). It plays the role of **observation** under the new
model — one row per source IP. The dedup'd "same hands" view lives
alongside it in `attacker_identities`.
The clusterer (per-IP observation → identity → campaign) is the same
problem at different scales: clustering on increasingly meta signals.
See [Campaign-Clustering](Campaign-Clustering) for the campaign layer.
---
## Schema
### `AttackerIdentity` (new in `decnet/web/db/models/attackers.py`)
| Column | Type | Notes |
|---|---|---|
| `uuid` | TEXT PK | uuid4(); not fingerprint-derived |
| `schema_version` | INT, default 1 | Federation gossip compat from day one |
| `campaign_id` | TEXT FK nullable | Set by the campaign clusterer |
| `first_seen_at` / `last_seen_at` / `created_at` / `updated_at` | TIMESTAMP | |
| `confidence` | REAL nullable | Clusterer's identity-cohesion score |
| `observation_count` | INT default 0 | Denormalized; live count via API |
| `ja3_hashes` / `hassh_hashes` / `payload_simhashes` / `c2_endpoints` | JSON-in-TEXT nullable | Multi-tool actors get multiple values |
| `kd_digraph_simhash` | BINARY(8) nullable | V2 keystroke-dynamics hook |
| `merged_into_uuid` | TEXT self-FK nullable | Soft-merge audit trail |
| `notes` | TEXT nullable | Operator-editable annotations |
### `attackers.identity_id` (new column)
Nullable indexed FK to `attacker_identities.uuid`. NULL until the
clusterer resolves an identity for the row. Ingestion paths
(profiler, correlator) keep upserting `attackers` rows without
touching `identity_id`.
---
## API
All endpoints are read-only and auth-gated identically to
`/api/v1/attackers/*`.
| Method | Path | Returns |
|---|---|---|
| GET | `/api/v1/identities` | Paginated list, newest-updated first; excludes merged-out rows |
| GET | `/api/v1/identities/{uuid}` | Detail row + `observation_count_live`. Transparently follows `merged_into_uuid` to surface the canonical winner |
| GET | `/api/v1/identities/{uuid}/observations` | Paginated `Attacker` rows FK'd to the (resolved) identity uuid |
Empty result / 404 is the universal response while the clusterer hasn't
ramped up yet.
```bash
# Empty list while the clusterer hasn't run
curl -H "Authorization: Bearer $TOKEN" \
http://localhost:8000/api/v1/identities
# {"total": 0, "limit": 50, "offset": 0, "data": []}
# Empty 404 for any uuid
curl -H "Authorization: Bearer $TOKEN" \
http://localhost:8000/api/v1/identities/00000000-0000-0000-0000-000000000000
# {"detail": "Identity not found"}
```
---
## Bus topics
Constants ship in `decnet.bus.topics`; **no publishers exist yet**.
Subscribers can register against `identity.>` from day one and start
receiving events the instant the clusterer comes online.
| Topic | Payload | When |
|---|---|---|
| `identity.formed` | `{identity_uuid, observation_uuids: [...], confidence, first_seen_at}` | Clusterer creates a new identity from one or more observations |
| `identity.observation.linked` | `{identity_uuid, observation_uuid, confidence_after}` | Observation attached / re-attached to an identity |
| `identity.merged` | `{winner_uuid, loser_uuid, observation_uuids: [...], confidence_after}` | Two identities collapsed. Loser keeps its row with `merged_into_uuid` set; subscribers re-key cached references to the winner |
Built via `topics.identity(IDENTITY_FORMED)` etc. See
[Service-Bus](Service-Bus) for the full topic table.
`identity.campaign.assigned` is deferred and will land alongside the
campaign clusterer.
---
## Frontend
`decnet_web/src/components/IdentityDetail.tsx``/identities/:id`
- Header with uuid, optional `CAMPAIGN · <prefix>` badge if assigned,
optional `MERGED INTO <prefix>` link (clicks navigate to the winner).
- Stats row: live observation count, distinct JA3, HASSH, payload
SimHashes, C2 endpoints.
- Confidence + schema version (only rendered if populated).
- Fingerprint detail tag lists for JA3, HASSH, C2 endpoints.
- Observations table (linked rows back to AttackerDetail).
- Optional analyst-notes panel.
`AttackerDetail` gains a conditional `IDENTITY · <prefix>` badge in
the header when `identity_id` is non-null. Click → `/identities/<uuid>`.
Zero behavior change while `identity_id` is uniformly NULL.
---
## What hasn't been built yet
- **Clusterer worker.** Reads observations, computes fingerprint
similarity (Hamming on simhashes, Jaccard / weighted edges on
hash sets), runs connected-components, writes identities, publishes
bus events. Designed in
[Campaign-Clustering](Campaign-Clustering) §4 and the in-repo
`CAMPAIGN_CLUSTERING.md`.
- **Identity-level intel** (`attacker_identity_intel`). Aggregate
reputation, threat-actor naming from MISP/CTI, MITRE ATT&CK tags.
Different lifecycle than the IP-scoped `attacker_intel` (DEBT-041);
separate table, separate enricher. The current API aggregates
observation intel on read in the meantime.
- **`SessionProfile.identity_id` FK.** Open question for V2 keystroke
dynamics. Currently sessions FK to `Log`, not `Attacker` / identity.
- **Webhook payload `identity_id` enrichment.** Adds opportunistically
once identities are populated.
---
## Testing
```bash
source .311/bin/activate
# Schema invariants (table exists, FK targets, nullable columns,
# constraint blocks orphans, schema_version defaults to 1).
pytest tests/db/test_identity_schema.py -v
# API surface against the empty table.
pytest tests/web/test_api_identities.py -v
# Topic constants and builder.
pytest tests/bus/test_topics.py -v -k identity
```
---
See also: [Campaign-Clustering](Campaign-Clustering) (the next layer
up), [Service-Bus](Service-Bus) (topic table),
[Module-Reference-Web](Module-Reference-Web).

@@ -151,6 +151,9 @@ Current topic families:
| `attacker.observed` | Correlator | first sighting; consumed by `decnet enrich` as a wake signal |
| `attacker.scored` | Profiler | post-enrichment score update; also wakes `decnet enrich` |
| `attacker.intel.enriched` | `decnet enrich` | `{attacker_ip, aggregate_verdict, providers}` after a threat-intel pass; webhook → SIEM |
| `identity.formed` | _reserved (clusterer)_ | `{identity_uuid, observation_uuids: [...], confidence, first_seen_at}` — clusterer creates a new identity from one or more observations |
| `identity.observation.linked` | _reserved (clusterer)_ | `{identity_uuid, observation_uuid, confidence_after}` — observation attached / re-attached to an identity |
| `identity.merged` | _reserved (clusterer)_ | `{winner_uuid, loser_uuid, observation_uuids: [...], confidence_after}` — two identities collapsed; subscribers re-key cached references to the winner |
| `system.log` | _reserved_ | — |
| `system.bus.health` | Bus worker heartbeat | `{ts, uptime_s}` |

@@ -47,6 +47,7 @@
- [PKI-and-mTLS](PKI-and-mTLS)
- [Testing-and-CI](Testing-and-CI)
- [Campaign-Clustering](Campaign-Clustering)
- [Identity-Resolution](Identity-Resolution)
- [Performance-Story](Performance-Story)
- [Tracing-and-Profiling](Tracing-and-Profiling)