docs(wiki): Identity-Resolution page + identity.* topics in Service-Bus
Documents the observation/identity/campaign three-level hierarchy, the read-only API surface, the deferred clusterer worker, and how to test the substrate. Companion to development/IDENTITY_RESOLUTION.md in the main repo. Service-Bus.md gains the three identity.* topic rows (reserved for the future clusterer); sidebar links Identity-Resolution under Developer docs alongside Campaign-Clustering.
169
Identity-Resolution.md
Normal file
169
Identity-Resolution.md
Normal file
@@ -0,0 +1,169 @@
|
||||
# Identity Resolution
|
||||
|
||||
Pre-implementation feature. The clusterer worker that populates these
|
||||
rows is a separate downstream effort; the substrate (schema, API,
|
||||
frontend, bus topics) ships first so downstream work and the campaign
|
||||
clustering fixtures can target a stable shape.
|
||||
|
||||
The full design lives in the repo at
|
||||
[`development/IDENTITY_RESOLUTION.md`](https://github.com/dec-net/decnet/blob/main/development/IDENTITY_RESOLUTION.md).
|
||||
This page documents the current substrate.
|
||||
|
||||
---
|
||||
|
||||
## The three-level hierarchy
|
||||
|
||||
DECNET's previous data model conflated two distinct concepts in one
|
||||
table:
|
||||
|
||||
| Level | Unit | How it's created | Stability |
|
||||
|---|---|---|---|
|
||||
| `attackers` | per-IP — "we saw activity from X starting at T" | profiler ingest, dumb / synchronous | mutable — IPs come and go |
|
||||
| `attacker_identities` | per-actor — "these N observations are the same hands" | clusterer, async, on stable fingerprints (JA3, HASSH, payload, C2, kd_digraph_simhash) | semi-stable — tightens as evidence accumulates |
|
||||
| `campaigns` | per-operation — "these M identities are coordinated" | clusterer, async, on shared infra / tooling / phase handoff | derived from identities |
|
||||
|
||||
`attackers` keeps its name and user-facing meaning ("the attacker the
|
||||
operator clicked"). It plays the role of **observation** under the new
|
||||
model — one row per source IP. The dedup'd "same hands" view lives
|
||||
alongside it in `attacker_identities`.
|
||||
|
||||
The clusterer (per-IP observation → identity → campaign) is the same
|
||||
problem at different scales: clustering on increasingly meta signals.
|
||||
See [Campaign-Clustering](Campaign-Clustering) for the campaign layer.
|
||||
|
||||
---
|
||||
|
||||
## Schema
|
||||
|
||||
### `AttackerIdentity` (new in `decnet/web/db/models/attackers.py`)
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| `uuid` | TEXT PK | uuid4(); not fingerprint-derived |
|
||||
| `schema_version` | INT, default 1 | Federation gossip compat from day one |
|
||||
| `campaign_id` | TEXT FK nullable | Set by the campaign clusterer |
|
||||
| `first_seen_at` / `last_seen_at` / `created_at` / `updated_at` | TIMESTAMP | |
|
||||
| `confidence` | REAL nullable | Clusterer's identity-cohesion score |
|
||||
| `observation_count` | INT default 0 | Denormalized; live count via API |
|
||||
| `ja3_hashes` / `hassh_hashes` / `payload_simhashes` / `c2_endpoints` | JSON-in-TEXT nullable | Multi-tool actors get multiple values |
|
||||
| `kd_digraph_simhash` | BINARY(8) nullable | V2 keystroke-dynamics hook |
|
||||
| `merged_into_uuid` | TEXT self-FK nullable | Soft-merge audit trail |
|
||||
| `notes` | TEXT nullable | Operator-editable annotations |
|
||||
|
||||
### `attackers.identity_id` (new column)
|
||||
|
||||
Nullable indexed FK to `attacker_identities.uuid`. NULL until the
|
||||
clusterer resolves an identity for the row. Ingestion paths
|
||||
(profiler, correlator) keep upserting `attackers` rows without
|
||||
touching `identity_id`.
|
||||
|
||||
---
|
||||
|
||||
## API
|
||||
|
||||
All endpoints are read-only and auth-gated identically to
|
||||
`/api/v1/attackers/*`.
|
||||
|
||||
| Method | Path | Returns |
|
||||
|---|---|---|
|
||||
| GET | `/api/v1/identities` | Paginated list, newest-updated first; excludes merged-out rows |
|
||||
| GET | `/api/v1/identities/{uuid}` | Detail row + `observation_count_live`. Transparently follows `merged_into_uuid` to surface the canonical winner |
|
||||
| GET | `/api/v1/identities/{uuid}/observations` | Paginated `Attacker` rows FK'd to the (resolved) identity uuid |
|
||||
|
||||
Empty result / 404 is the universal response while the clusterer hasn't
|
||||
ramped up yet.
|
||||
|
||||
```bash
|
||||
# Empty list while the clusterer hasn't run
|
||||
curl -H "Authorization: Bearer $TOKEN" \
|
||||
http://localhost:8000/api/v1/identities
|
||||
# {"total": 0, "limit": 50, "offset": 0, "data": []}
|
||||
|
||||
# Empty 404 for any uuid
|
||||
curl -H "Authorization: Bearer $TOKEN" \
|
||||
http://localhost:8000/api/v1/identities/00000000-0000-0000-0000-000000000000
|
||||
# {"detail": "Identity not found"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bus topics
|
||||
|
||||
Constants ship in `decnet.bus.topics`; **no publishers exist yet**.
|
||||
Subscribers can register against `identity.>` from day one and start
|
||||
receiving events the instant the clusterer comes online.
|
||||
|
||||
| Topic | Payload | When |
|
||||
|---|---|---|
|
||||
| `identity.formed` | `{identity_uuid, observation_uuids: [...], confidence, first_seen_at}` | Clusterer creates a new identity from one or more observations |
|
||||
| `identity.observation.linked` | `{identity_uuid, observation_uuid, confidence_after}` | Observation attached / re-attached to an identity |
|
||||
| `identity.merged` | `{winner_uuid, loser_uuid, observation_uuids: [...], confidence_after}` | Two identities collapsed. Loser keeps its row with `merged_into_uuid` set; subscribers re-key cached references to the winner |
|
||||
|
||||
Built via `topics.identity(IDENTITY_FORMED)` etc. See
|
||||
[Service-Bus](Service-Bus) for the full topic table.
|
||||
|
||||
`identity.campaign.assigned` is deferred and will land alongside the
|
||||
campaign clusterer.
|
||||
|
||||
---
|
||||
|
||||
## Frontend
|
||||
|
||||
`decnet_web/src/components/IdentityDetail.tsx` — `/identities/:id`
|
||||
|
||||
- Header with uuid, optional `CAMPAIGN · <prefix>` badge if assigned,
|
||||
optional `MERGED INTO <prefix>` link (clicks navigate to the winner).
|
||||
- Stats row: live observation count, distinct JA3, HASSH, payload
|
||||
SimHashes, C2 endpoints.
|
||||
- Confidence + schema version (only rendered if populated).
|
||||
- Fingerprint detail tag lists for JA3, HASSH, C2 endpoints.
|
||||
- Observations table (linked rows back to AttackerDetail).
|
||||
- Optional analyst-notes panel.
|
||||
|
||||
`AttackerDetail` gains a conditional `IDENTITY · <prefix>` badge in
|
||||
the header when `identity_id` is non-null. Click → `/identities/<uuid>`.
|
||||
Zero behavior change while `identity_id` is uniformly NULL.
|
||||
|
||||
---
|
||||
|
||||
## What hasn't been built yet
|
||||
|
||||
- **Clusterer worker.** Reads observations, computes fingerprint
|
||||
similarity (Hamming on simhashes, Jaccard / weighted edges on
|
||||
hash sets), runs connected-components, writes identities, publishes
|
||||
bus events. Designed in
|
||||
[Campaign-Clustering](Campaign-Clustering) §4 and the in-repo
|
||||
`CAMPAIGN_CLUSTERING.md`.
|
||||
- **Identity-level intel** (`attacker_identity_intel`). Aggregate
|
||||
reputation, threat-actor naming from MISP/CTI, MITRE ATT&CK tags.
|
||||
Different lifecycle than the IP-scoped `attacker_intel` (DEBT-041);
|
||||
separate table, separate enricher. The current API aggregates
|
||||
observation intel on read in the meantime.
|
||||
- **`SessionProfile.identity_id` FK.** Open question for V2 keystroke
|
||||
dynamics. Currently sessions FK to `Log`, not `Attacker` / identity.
|
||||
- **Webhook payload `identity_id` enrichment.** Adds opportunistically
|
||||
once identities are populated.
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
source .311/bin/activate
|
||||
|
||||
# Schema invariants (table exists, FK targets, nullable columns,
|
||||
# constraint blocks orphans, schema_version defaults to 1).
|
||||
pytest tests/db/test_identity_schema.py -v
|
||||
|
||||
# API surface against the empty table.
|
||||
pytest tests/web/test_api_identities.py -v
|
||||
|
||||
# Topic constants and builder.
|
||||
pytest tests/bus/test_topics.py -v -k identity
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
See also: [Campaign-Clustering](Campaign-Clustering) (the next layer
|
||||
up), [Service-Bus](Service-Bus) (topic table),
|
||||
[Module-Reference-Web](Module-Reference-Web).
|
||||
@@ -151,6 +151,9 @@ Current topic families:
|
||||
| `attacker.observed` | Correlator | first sighting; consumed by `decnet enrich` as a wake signal |
|
||||
| `attacker.scored` | Profiler | post-enrichment score update; also wakes `decnet enrich` |
|
||||
| `attacker.intel.enriched` | `decnet enrich` | `{attacker_ip, aggregate_verdict, providers}` after a threat-intel pass; webhook → SIEM |
|
||||
| `identity.formed` | _reserved (clusterer)_ | `{identity_uuid, observation_uuids: [...], confidence, first_seen_at}` — clusterer creates a new identity from one or more observations |
|
||||
| `identity.observation.linked` | _reserved (clusterer)_ | `{identity_uuid, observation_uuid, confidence_after}` — observation attached / re-attached to an identity |
|
||||
| `identity.merged` | _reserved (clusterer)_ | `{winner_uuid, loser_uuid, observation_uuids: [...], confidence_after}` — two identities collapsed; subscribers re-key cached references to the winner |
|
||||
| `system.log` | _reserved_ | — |
|
||||
| `system.bus.health` | Bus worker heartbeat | `{ts, uptime_s}` |
|
||||
|
||||
|
||||
@@ -47,6 +47,7 @@
|
||||
- [PKI-and-mTLS](PKI-and-mTLS)
|
||||
- [Testing-and-CI](Testing-and-CI)
|
||||
- [Campaign-Clustering](Campaign-Clustering)
|
||||
- [Identity-Resolution](Identity-Resolution)
|
||||
- [Performance-Story](Performance-Story)
|
||||
- [Tracing-and-Profiling](Tracing-and-Profiling)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user