merge: testing → main (reconcile 2-week divergence)
This commit is contained in:
338
development/IDENTITY_RESOLUTION.md
Normal file
338
development/IDENTITY_RESOLUTION.md
Normal file
@@ -0,0 +1,338 @@
|
||||
# Identity Resolution — Design
|
||||
|
||||
**Status:** pre-implementation. This doc is the spec; code follows.
|
||||
|
||||
**Roadmap pressure:** Campaign Clustering (`CAMPAIGN_CLUSTERING.md`),
|
||||
Keystroke Dynamics (`DEVELOPMENT_V2.md` §1), Federation
|
||||
(`DEVELOPMENT_V2.md` §3).
|
||||
|
||||
## Premise
|
||||
|
||||
The `attackers` table is keyed per-IP — one row every time we observe
|
||||
activity from a new source IP. That works for naive scoring, but it
|
||||
conflates two distinct concepts:
|
||||
|
||||
- **Observation event.** "We saw activity from IP X starting at T1."
|
||||
Mutable; IPs come and go; the unit of *ingestion* on the wire.
|
||||
- **Actor identity.** "These N observations are the same hands."
|
||||
Semi-stable; recovered from signals the attacker can't cheaply rotate
|
||||
(JA3, HASSH, payload hashes, C2 callbacks, eventually keystroke
|
||||
rhythm).
|
||||
|
||||
A campaign is then one-level-up: "these M identities are coordinated."
|
||||
The clean ladder is **Observation → Identity → Campaign**, three
|
||||
levels, each derived from the level below by clustering on
|
||||
increasingly meta signals.
|
||||
|
||||
We will not ship a clusterer in this PR sequence. The plan here is the
|
||||
**substrate the clusterer writes into** — schema, API, bus topics,
|
||||
frontend hooks — landed empty so downstream work targets a stable
|
||||
shape and the campaign clustering fixtures can encode honest
|
||||
multi-row-per-actor scenarios.
|
||||
|
||||
Order of work, strictly:
|
||||
|
||||
1. This design doc.
|
||||
2. Schema-only PR — `attacker_identities` table + nullable
|
||||
`attackers.identity_id` FK. Empty table, no production reads/writes.
|
||||
3. Read-only API — `/api/v1/identities/*` returning empty lists / 404.
|
||||
4. Frontend — conditional `IdentityDetail` page; `AttackerDetail`
|
||||
gains a "Identity: <link>" badge when populated.
|
||||
5. Bus topics + wiki — declare topics, document, no publishers yet.
|
||||
6. Test factory adapter — campaign factory emits N rows per
|
||||
IP-rotating actor with shared `truth_identity_id`. Unblocks
|
||||
fixture 2 (`vpn_hopping`) and beyond.
|
||||
|
||||
The clusterer itself follows after fixtures 2–6 ship, on the
|
||||
substrate this PR sequence builds.
|
||||
|
||||
---
|
||||
|
||||
## Why now, why not later
|
||||
|
||||
**Pre-v1 schema changes are nearly free.** SQLModel
|
||||
`metadata.create_all()` picks up new tables; new nullable columns are
|
||||
free; no Alembic until v1. Real production data is currently small
|
||||
and replayable.
|
||||
|
||||
**Post-v1 the cost compounds.** Real attacker rows accumulate, FKs
|
||||
proliferate, dashboard URLs get bookmarked, federation gossip locks
|
||||
in `schema_version=1` payload shapes. Every month we wait, the
|
||||
migration becomes harder.
|
||||
|
||||
**V2 keystroke dynamics needs an identity row.** `kd_digraph_simhash`
|
||||
correlation is *the* feature that graduates fingerprint into identity.
|
||||
It needs a row to attach to. Without it, the V2 work either rebuilds
|
||||
this substrate from scratch, or hangs simhash off the per-IP
|
||||
observation table — which means an IP-rotating actor's typing rhythm
|
||||
gets fragmented across every IP they used.
|
||||
|
||||
**Federation gossip is identity-level.** Operators in different
|
||||
geographies will never share an IP. They may share an identity.
|
||||
|
||||
---
|
||||
|
||||
## Why sibling-add, not rename
|
||||
|
||||
**Considered:** rename `attackers` → `attacker_observations`.
|
||||
Eliminates the "attacker means observation" lie at the schema layer.
|
||||
|
||||
**Rejected.** Costs:
|
||||
|
||||
- 126 occurrences of `attacker_uuid` across the codebase, mid-migration
|
||||
churn directly on top of DEBT-041 (commit `3eb67c9`, just landed).
|
||||
- Frontend `Attacker` → `Observation` mismatches user mental model.
|
||||
Operators click "show me the attacker," not "show me the
|
||||
observation." Splunk, ELK, MISP, every CTI platform keeps the
|
||||
user-facing concept stable and exposes identity resolution as a
|
||||
derived view.
|
||||
- The lie is in *documentation*, not in code. Code already operates
|
||||
per-IP correctly; it's just named imprecisely. Fixing it via
|
||||
docstring + wiki is far cheaper than renaming.
|
||||
|
||||
**Adopted:** **sibling-add.** Keep the `attackers` table; document its
|
||||
semantic role as "per-IP observation." Add `attacker_identities` as a
|
||||
new sibling. Add nullable `attackers.identity_id` FK. The clusterer
|
||||
populates identities. Existing code paths are unchanged. Frontend
|
||||
`AttackerDetail` gains a conditional widget; new `IdentityDetail`
|
||||
page aggregates observations.
|
||||
|
||||
The "Attacker" vocabulary continues to mean "what the operator clicks
|
||||
in the dashboard" — the per-IP observation row. "Identity" is the
|
||||
analyst-facing concept, surfaced when the clusterer has resolved one.
|
||||
|
||||
---
|
||||
|
||||
## Schema
|
||||
|
||||
### `AttackerIdentity` (new)
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| `uuid` | TEXT PK | uuid4(); identities are NOT fingerprint-derived (fingerprints evolve as the actor's tooling changes; the row's identity must outlive its current fingerprints) |
|
||||
| `schema_version` | INT, default 1 | Federation-gossip compat from day one. Bumping feature definitions without a version field silently poisons other operators' clustering |
|
||||
| `campaign_id` | TEXT FK nullable | Set by the campaign clusterer (downstream effort) |
|
||||
| `first_seen_at` | TIMESTAMP | Earliest observation linked to this identity |
|
||||
| `last_seen_at` | TIMESTAMP | Latest observation linked to this identity |
|
||||
| `created_at` / `updated_at` | TIMESTAMP | Row audit |
|
||||
| `confidence` | REAL nullable | Identity-cohesion score from clusterer; null until clusterer writes |
|
||||
| `observation_count` | INT default 0 | Denormalized for cheap dashboard reads. Maintained by the clusterer when it links/unlinks |
|
||||
| `ja3_hashes` | TEXT (JSON list) nullable | Multiple TLS stacks per actor possible (different tools, different hosts) |
|
||||
| `hassh_hashes` | TEXT (JSON list) nullable | |
|
||||
| `payload_simhashes` | TEXT (JSON list) nullable | 64-bit ints serialized as hex strings |
|
||||
| `c2_endpoints` | TEXT (JSON list) nullable | Domain or IP, dedup'd |
|
||||
| `kd_digraph_simhash` | BINARY(8) nullable | V2 keystroke-dynamics hook. Same shape as `SessionProfile.kd_digraph_simhash`; identity-level value is the centroid (or majority vote) across the identity's sessions |
|
||||
| `merged_into_uuid` | TEXT self-FK nullable | Soft-merge audit trail. When the clusterer combines two existing identities, the loser's row stays in place with `merged_into_uuid` pointing at the winner — preserves the audit trail without orphaning FKs |
|
||||
| `notes` | TEXT nullable | Operator-editable. Free-form |
|
||||
|
||||
All clusterer-populated fields are nullable; the table ships empty and
|
||||
is valid in that state.
|
||||
|
||||
### `attackers` (extended)
|
||||
|
||||
One nullable column added:
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| `identity_id` | TEXT FK nullable, indexed | References `attacker_identities.uuid`. NULL until the clusterer resolves an identity |
|
||||
|
||||
**Migration:** None needed. Pre-v1 SQLModel `metadata.create_all()`
|
||||
adds the new table and column. No data backfill (column is nullable).
|
||||
|
||||
---
|
||||
|
||||
## Where intel lives — both, with clear semantics
|
||||
|
||||
DEBT-041 (`3eb67c9`) just re-keyed `attacker_intel` on `attacker_uuid`
|
||||
(observation level). That work is correct; we do **not** touch it
|
||||
here.
|
||||
|
||||
**Observation-level intel** (`attacker_intel`, current):
|
||||
- AbuseIPDB confidence, GreyNoise classification, abuse.ch matches,
|
||||
PTR records, GeoIP — all **IP-scoped facts**. An identity spanning
|
||||
40 IPs has 40 distinct AbuseIPDB verdicts. We must not lose that
|
||||
granularity.
|
||||
|
||||
**Identity-level intel** (`attacker_identity_intel`, deferred):
|
||||
- Aggregate reputation (e.g. "this identity has been reported as
|
||||
malicious across 4 of 5 observed IPs").
|
||||
- Threat-actor naming from MISP/CTI feeds, where naming is
|
||||
actor-scoped not IP-scoped.
|
||||
- TTP / MITRE ATT&CK tags.
|
||||
|
||||
Different lifecycle (clusterer-driven, not enricher-driven), different
|
||||
inputs (aggregates over observations, not direct API calls), so it
|
||||
gets its own table and its own enricher when it ships. **Not in this
|
||||
PR sequence.**
|
||||
|
||||
The IdentityDetail API (read side) aggregates observation intel on
|
||||
read until the identity-level table exists.
|
||||
|
||||
---
|
||||
|
||||
## Bus Topics
|
||||
|
||||
Three new topics. No publishers in this PR sequence — constants exist;
|
||||
publishers ship with the clusterer.
|
||||
|
||||
| Topic | Payload | When |
|
||||
|---|---|---|
|
||||
| `identity.formed` | `{identity_uuid, observation_uuids: [], confidence, first_seen_at}` | Clusterer creates a new identity from one or more observations |
|
||||
| `identity.observation.linked` | `{identity_uuid, observation_uuid, confidence_after}` | Clusterer attaches an observation to an existing identity (or re-attaches one previously linked elsewhere) |
|
||||
| `identity.merged` | `{winner_uuid, loser_uuid, observation_uuids: [], confidence_after}` | Clusterer collapses two identities. The loser's row stays in place via `merged_into_uuid`; subscribers re-key any cached identity references to the winner |
|
||||
|
||||
**Deferred:** `identity.campaign.assigned`. Adds opportunistically
|
||||
when the campaign clusterer ships. YAGNI before then.
|
||||
|
||||
**Wiki:** `Service-Bus.md` documents these in the same commit that
|
||||
adds the constants (per the project's `feedback_wiki_bus_signals`
|
||||
rule).
|
||||
|
||||
---
|
||||
|
||||
## API Surface
|
||||
|
||||
All new endpoints are read-only and auth-gated identically to
|
||||
`/api/v1/attackers/*` (per `project_health_auth_gated`).
|
||||
|
||||
| Method | Path | Returns |
|
||||
|---|---|---|
|
||||
| GET | `/api/v1/identities` | Paginated list of identities. Response shape mirrors `AttackersResponse` |
|
||||
| GET | `/api/v1/identities/{uuid}` | Identity row + aggregated intel summary (rolled up from FK'd observations) + campaign stub if assigned |
|
||||
| GET | `/api/v1/identities/{uuid}/observations` | Paginated list of `Attacker` observation rows that FK to this identity |
|
||||
|
||||
While the table is empty, every endpoint returns either an empty list
|
||||
or 404 — both are valid responses.
|
||||
|
||||
**`AttackerDetail` change** (frontend, not API): when
|
||||
`attackers.identity_id` is non-null, render a "Identity: <uuid-link>"
|
||||
badge linking to `/identities/<uuid>`. No change otherwise.
|
||||
|
||||
---
|
||||
|
||||
## Frontend
|
||||
|
||||
- **`AttackerDetail.tsx`** — conditional badge. Zero behavior change
|
||||
when `identity_id` is null.
|
||||
- **`IdentityDetail.tsx`** (new) — aggregates observations, fingerprint
|
||||
summary, intel summary, campaign link. Same visual vocabulary as
|
||||
`AttackerDetail` so operators feel at home.
|
||||
- **Routing** — `/identities/:uuid` alongside `/attackers/:uuid`.
|
||||
- Default browse remains "Attackers." There is no "Identities" tab
|
||||
in the main navigation until identities are populated; once they
|
||||
are, an "Identity Resolution" entry appears under the Analytics
|
||||
section (this is post-clusterer; out of scope here).
|
||||
|
||||
---
|
||||
|
||||
## Risks
|
||||
|
||||
1. **Confidence drift.** The clusterer can rewrite identity
|
||||
assignments as evidence accumulates. An observation linked to
|
||||
identity-A today may move to identity-B tomorrow. UI must surface
|
||||
this without alarming operators ("This observation has been
|
||||
re-attributed; previous identity remains as a soft-merged
|
||||
reference"). The `merged_into_uuid` chain is the audit trail.
|
||||
|
||||
2. ~~**API URL stability.**~~ Resolved in commit `dc3d08d`: the
|
||||
read-only API follows `merged_into_uuid` and surfaces the canonical
|
||||
winner. Loser UUIDs resolve to the winner row.
|
||||
|
||||
3. **Schema-version lock-in for federation.** `schema_version=1` is
|
||||
what we ship. Any fingerprint added to the identity row post-v1
|
||||
bumps the version. Operators behind by versions get a degraded
|
||||
gossip experience but should not crash — the receiver must
|
||||
tolerate unknown fields.
|
||||
|
||||
4. **Observation FK proliferation.** Today only `attackers` would
|
||||
carry `identity_id`. Tomorrow, `SessionProfile`, `AttackerIntel`,
|
||||
webhook payloads might want it too. Resist proliferation; the
|
||||
normalised path is `observation.identity_id` and identity-level
|
||||
facts go in `attacker_identity_intel`. We only carry `identity_id`
|
||||
on tables where joining via the observation row is materially
|
||||
slower at read time.
|
||||
|
||||
5. **Identity-level intel scope creep.** Easy to start moving DEBT-041
|
||||
intel up to identity level "for cleanliness." Don't. AbuseIPDB
|
||||
results are IP-scoped facts; moving them up loses information.
|
||||
Identity-level intel is *aggregate* intel, a different thing.
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. ~~**Revocability of identity merges.**~~ **Resolved 2026-04-26:**
|
||||
merges are revocable. `identity.unmerged` topic ships in
|
||||
`decnet/bus/topics.py` alongside the existing three so subscribers
|
||||
on `identity.>` get it from day one. Clusterer clears
|
||||
`merged_into_uuid`, re-links observations, publishes
|
||||
`identity.unmerged` + a fresh `identity.formed` for the
|
||||
resurrected side.
|
||||
|
||||
2. ~~**`AttackerDetail` UX when `identity_id` changes.**~~ **Resolved
|
||||
2026-04-26:** SSE channel modeled on the topology-mutator SSE.
|
||||
New endpoint subscribes to `identity.>`, JWT via `?token=`,
|
||||
snapshot-on-connect + live forward. `AttackerDetail` and
|
||||
`IdentityDetail` consume it.
|
||||
|
||||
3. **`SessionProfile.identity_id` FK.** Does this PR sequence add it,
|
||||
or does it wait for V2 keystroke dynamics? Leaning **wait** — the
|
||||
FK is only useful when the identity-level keystroke similarity
|
||||
query exists, which is V2 work. Adds a column we don't read in
|
||||
v1 = unused complexity.
|
||||
|
||||
4. **Webhook payload identity_id.** Adds opportunistically once
|
||||
identities are populated. Not load-bearing for this PR sequence.
|
||||
|
||||
5. **Identity-level intel table.** Schema sketch is straightforward
|
||||
(uuid PK, identity_uuid FK, source, confidence, ttps JSON,
|
||||
timestamps), but the enricher is meaningfully different from
|
||||
the IP-scoped one. Defer entirely.
|
||||
|
||||
---
|
||||
|
||||
## What is explicitly NOT in this design
|
||||
|
||||
- The clusterer worker (`decnet/clustering/` worker bin). Designed in
|
||||
`CAMPAIGN_CLUSTERING.md` §4; lands on top of this substrate.
|
||||
- `attacker_identity_intel` table.
|
||||
- `SessionProfile.identity_id` FK.
|
||||
- Webhook payload `identity_id` enrichment.
|
||||
- Renaming `attackers` → `attacker_observations`. Considered, rejected.
|
||||
- Identity-level federation gossip. The schema is federation-ready
|
||||
(schema_version, no operator-identifying fields); the gossip wire
|
||||
itself is V2.
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
After all 5 commits below land:
|
||||
|
||||
```bash
|
||||
source .311/bin/activate
|
||||
|
||||
# Schema lands cleanly.
|
||||
pytest tests/db/test_identity_schema.py -v
|
||||
|
||||
# API surface returns expected shapes against an empty identities table.
|
||||
pytest tests/web/test_api_identities.py -v
|
||||
|
||||
# No regressions on the unchanged path.
|
||||
pytest tests/web/ tests/profiler/ tests/correlation/ -v
|
||||
|
||||
# Bus topic constants importable; wiki updated.
|
||||
python -c "from decnet.bus.topics import IDENTITY_FORMED, IDENTITY_OBSERVATION_LINKED, IDENTITY_MERGED; print('OK')"
|
||||
test -f wiki-checkout/Identity-Resolution.md
|
||||
grep -q "identity.formed" wiki-checkout/Service-Bus.md
|
||||
|
||||
# Factory adapter unblocks fixture 2.
|
||||
pytest tests/clustering/test_campaign_factory.py -v
|
||||
```
|
||||
|
||||
Manual smoke after schema + API + frontend:
|
||||
|
||||
- `decnet api` then `decnet web`.
|
||||
- Browse to an existing AttackerDetail page → no badge (identity_id is NULL).
|
||||
- `GET /api/v1/identities` → `{"data": [], "total": 0, ...}`.
|
||||
- `GET /api/v1/identities/<random-uuid>` → 404.
|
||||
Reference in New Issue
Block a user