1
Identity Resolution
anti edited this page 2026-04-26 07:15:55 -04:00

Identity Resolution

Pre-implementation feature. The clusterer worker that populates these rows is a separate downstream effort; the substrate (schema, API, frontend, bus topics) ships first so downstream work and the campaign clustering fixtures can target a stable shape.

The full design lives in the repo at development/IDENTITY_RESOLUTION.md. This page documents the current substrate.


The three-level hierarchy

DECNET's previous data model conflated two distinct concepts in one table:

Level Unit How it's created Stability
attackers per-IP — "we saw activity from X starting at T" profiler ingest, dumb / synchronous mutable — IPs come and go
attacker_identities per-actor — "these N observations are the same hands" clusterer, async, on stable fingerprints (JA3, HASSH, payload, C2, kd_digraph_simhash) semi-stable — tightens as evidence accumulates
campaigns per-operation — "these M identities are coordinated" clusterer, async, on shared infra / tooling / phase handoff derived from identities

attackers keeps its name and user-facing meaning ("the attacker the operator clicked"). It plays the role of observation under the new model — one row per source IP. The dedup'd "same hands" view lives alongside it in attacker_identities.

The clusterer (per-IP observation → identity → campaign) is the same problem at different scales: clustering on increasingly meta signals. See Campaign-Clustering for the campaign layer.


Schema

AttackerIdentity (new in decnet/web/db/models/attackers.py)

Column Type Notes
uuid TEXT PK uuid4(); not fingerprint-derived
schema_version INT, default 1 Federation gossip compat from day one
campaign_id TEXT FK nullable Set by the campaign clusterer
first_seen_at / last_seen_at / created_at / updated_at TIMESTAMP
confidence REAL nullable Clusterer's identity-cohesion score
observation_count INT default 0 Denormalized; live count via API
ja3_hashes / hassh_hashes / payload_simhashes / c2_endpoints JSON-in-TEXT nullable Multi-tool actors get multiple values
kd_digraph_simhash BINARY(8) nullable V2 keystroke-dynamics hook
merged_into_uuid TEXT self-FK nullable Soft-merge audit trail
notes TEXT nullable Operator-editable annotations

attackers.identity_id (new column)

Nullable indexed FK to attacker_identities.uuid. NULL until the clusterer resolves an identity for the row. Ingestion paths (profiler, correlator) keep upserting attackers rows without touching identity_id.


API

All endpoints are read-only and auth-gated identically to /api/v1/attackers/*.

Method Path Returns
GET /api/v1/identities Paginated list, newest-updated first; excludes merged-out rows
GET /api/v1/identities/{uuid} Detail row + observation_count_live. Transparently follows merged_into_uuid to surface the canonical winner
GET /api/v1/identities/{uuid}/observations Paginated Attacker rows FK'd to the (resolved) identity uuid

Empty result / 404 is the universal response while the clusterer hasn't ramped up yet.

# Empty list while the clusterer hasn't run
curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:8000/api/v1/identities
# {"total": 0, "limit": 50, "offset": 0, "data": []}

# Empty 404 for any uuid
curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:8000/api/v1/identities/00000000-0000-0000-0000-000000000000
# {"detail": "Identity not found"}

Bus topics

Constants ship in decnet.bus.topics; no publishers exist yet. Subscribers can register against identity.> from day one and start receiving events the instant the clusterer comes online.

Topic Payload When
identity.formed {identity_uuid, observation_uuids: [...], confidence, first_seen_at} Clusterer creates a new identity from one or more observations
identity.observation.linked {identity_uuid, observation_uuid, confidence_after} Observation attached / re-attached to an identity
identity.merged {winner_uuid, loser_uuid, observation_uuids: [...], confidence_after} Two identities collapsed. Loser keeps its row with merged_into_uuid set; subscribers re-key cached references to the winner

Built via topics.identity(IDENTITY_FORMED) etc. See Service-Bus for the full topic table.

identity.campaign.assigned is deferred and will land alongside the campaign clusterer.


Frontend

decnet_web/src/components/IdentityDetail.tsx/identities/:id

  • Header with uuid, optional CAMPAIGN · <prefix> badge if assigned, optional MERGED INTO <prefix> link (clicks navigate to the winner).
  • Stats row: live observation count, distinct JA3, HASSH, payload SimHashes, C2 endpoints.
  • Confidence + schema version (only rendered if populated).
  • Fingerprint detail tag lists for JA3, HASSH, C2 endpoints.
  • Observations table (linked rows back to AttackerDetail).
  • Optional analyst-notes panel.

AttackerDetail gains a conditional IDENTITY · <prefix> badge in the header when identity_id is non-null. Click → /identities/<uuid>. Zero behavior change while identity_id is uniformly NULL.


What hasn't been built yet

  • Clusterer worker. Reads observations, computes fingerprint similarity (Hamming on simhashes, Jaccard / weighted edges on hash sets), runs connected-components, writes identities, publishes bus events. Designed in Campaign-Clustering §4 and the in-repo CAMPAIGN_CLUSTERING.md.
  • Identity-level intel (attacker_identity_intel). Aggregate reputation, threat-actor naming from MISP/CTI, MITRE ATT&CK tags. Different lifecycle than the IP-scoped attacker_intel (DEBT-041); separate table, separate enricher. The current API aggregates observation intel on read in the meantime.
  • SessionProfile.identity_id FK. Open question for V2 keystroke dynamics. Currently sessions FK to Log, not Attacker / identity.
  • Webhook payload identity_id enrichment. Adds opportunistically once identities are populated.

Testing

source .311/bin/activate

# Schema invariants (table exists, FK targets, nullable columns,
# constraint blocks orphans, schema_version defaults to 1).
pytest tests/db/test_identity_schema.py -v

# API surface against the empty table.
pytest tests/web/test_api_identities.py -v

# Topic constants and builder.
pytest tests/bus/test_topics.py -v -k identity

See also: Campaign-Clustering (the next layer up), Service-Bus (topic table), Module-Reference-Web.