133 Commits

Author SHA1 Message Date
15b2e7ba5c refactor(db): split credentials.py into a credentials/ subpackage
Splits the 459-line credentials.py into two submixins plus a composing
CredentialsMixin in credentials/__init__.py:

  _core.py    (~190)  Credential capture: upsert, list, filters,
                      per-attacker / per-secret reads, attacker_uuid
                      backfill
  reuse.py    (~270)  CredentialReuse correlation: upsert, candidate
                      mining, list/get + the _enrich_with_secret helper
                      that lifts the printable/b64 from underlying rows

_merge_unique stays with reuse.py (its only caller).
_enrich_with_secret stays with reuse.py — it's an internal helper of
list_credential_reuses / get_credential_reuse_by_id, never called
from the capture path.
2026-04-28 16:05:57 -04:00
3d00de8fd3 refactor(db): split attackers.py into an attackers/ subpackage
Splits the 494-line attackers.py into five submixin files plus a
composing AttackersMixin in attackers/__init__.py:

  _core.py       (~95)   Attacker CRUD + _deserialize_attacker
  behavior.py    (~110)  AttackerBehavior + _deserialize_behavior
  sessions.py    (~50)   SessionProfile read/write
  smtp.py        (~70)   SmtpTarget per-attacker + cross-attacker views
  activity.py    (~190)  log-derived activity (commands, leaks,
                         artifacts, stored mail, session log, transcripts)

IdentitiesMixin.list_observations_for_identity calls
self._deserialize_attacker; MRO resolves it onto AttackersCoreMixin
through the composed SQLModelRepository class.
2026-04-28 15:46:28 -04:00
5e7d68fde3 refactor(db): split topology.py into a topology/ subpackage
Splits the 694-line topology.py into five submixin files plus a
composing TopologyMixin in topology/__init__.py:

  _core.py       (~225)  topologies CRUD + _assert_pending /
                         _check_and_bump_version concurrency guards
  lans.py        (~115)  LAN CRUD
  deckies.py     (~130)  topology decky CRUD + list_running_topology_deckies
  edges.py        (~80)  edge CRUD + status-event log
  mutations.py   (~165)  live reconciler queue (atomic claim + state writes)

Sibling submixins call self._assert_pending and
self._check_and_bump_version; MRO resolves them onto TopologyCoreMixin
through the composed SQLModelRepository class.
2026-04-28 15:16:42 -04:00
20e89eb0a6 refactor(db): extract TopologyMixin
Moves the 31 MazeNET topology methods (topologies CRUD, LANs, deckies,
edges, status events, mutation queue) into sqlmodel_repo/topology.py.
Includes _assert_pending and _check_and_bump_version concurrency
guards.

This is the last domain extraction; sqlmodel_repo/__init__.py is now
~165 lines: lifecycle (initialize/reinitialize/migrations), the admin
self-heal seed, get_state/set_state, and the mixin composition.
2026-04-28 15:11:14 -04:00
7483d01311 refactor(db): extract IdentitiesMixin and CampaignsMixin
Splits the AttackerIdentity and Campaign clustering reads/writes into
sqlmodel_repo/identities.py and sqlmodel_repo/campaigns.py.

Both call _deserialize_attacker (identities only) which resolves
through AttackersMixin via MRO.
2026-04-28 15:07:39 -04:00
912171d053 refactor(db): extract AttackersMixin
Moves the 19 attacker-domain methods (core CRUD, behavior, sessions,
smtp targets, log-derived activity views) plus the _deserialize_attacker
and _deserialize_behavior helpers into sqlmodel_repo/attackers.py.
2026-04-28 15:04:51 -04:00
7ba8bafcaa refactor(db): extract CredentialsMixin
Moves the 12 credential and credential-reuse methods (incl. the
_merge_unique and _enrich_with_secret helpers) into
sqlmodel_repo/credentials.py.
2026-04-28 15:00:04 -04:00
5b1af331b9 refactor(db): extract CanaryMixin
Moves the 13 canary blob/token/trigger methods into
sqlmodel_repo/canary.py.
2026-04-28 14:55:52 -04:00
03b3c8855c refactor(db): extract OrchestratorMixin
Moves the 9 orchestrator event/email log + prune methods into
sqlmodel_repo/orchestrator.py.
2026-04-28 14:54:20 -04:00
555cd13f09 refactor(db): extract RealismMixin
Moves the 8 synthetic-file + realism-config methods into
sqlmodel_repo/realism.py.
2026-04-28 14:52:59 -04:00
9b845269c9 refactor(db): extract LogsMixin
Moves the 8 log methods (incl. get_stats_summary aggregator) into
sqlmodel_repo/logs.py. get_log_histogram remains an abstract dialect
override point; sqlite/mysql subclasses still override it via MRO.
2026-04-28 14:51:35 -04:00
a0aeba5abc refactor(db): extract FleetMixin and promote JSON helpers
Moves the 6 fleet-decky methods (incl. cross-source list_running_deckies
aggregator) into sqlmodel_repo/fleet.py. _serialize_json_fields and
_deserialize_json_fields move to _helpers.py since they're shared
across fleet, topology, and canary.
2026-04-28 14:50:01 -04:00
d989cd0461 refactor(db): extract WebhooksMixin
Moves the 9 webhook-subscription methods (CRUD + delivery
bookkeeping) into sqlmodel_repo/webhooks.py.
2026-04-28 14:47:42 -04:00
167f140b0e refactor(db): extract BountiesMixin
Moves the 5 bounty methods plus the cross-table purge_logs_and_bounties
helper into sqlmodel_repo/bounties.py.
2026-04-28 14:46:39 -04:00
c6804d79b6 refactor(db): extract DeckiesMixin
Moves the 4 decky-shard CRUD methods into sqlmodel_repo/deckies.py.
2026-04-28 14:45:15 -04:00
eebf9e4c97 refactor(db): extract AuthMixin
Moves the 7 user CRUD methods into sqlmodel_repo/auth.py.
_ensure_admin_user stays in __init__.py so DECNET_ADMIN_PASSWORD
remains addressable at the module path tests already monkeypatch.
2026-04-28 14:43:49 -04:00
99adbebe75 refactor(db): extract SwarmMixin
Moves the 7 swarm-host CRUD methods into sqlmodel_repo/swarm.py.
2026-04-28 14:42:58 -04:00
85c914e754 refactor(db): extract AttackerIntelMixin
Moves upsert_attacker_intel, get_attacker_intel_by_uuid,
and get_unenriched_attackers into sqlmodel_repo/attacker_intel.py.
Composed onto SQLModelRepository via mixin inheritance.
2026-04-28 14:40:36 -04:00
e16f47ad24 refactor(db): extract _safe_session/_detach_close to _helpers.py
Module-level session helpers move into sqlmodel_repo/_helpers.py.
__init__.py re-exports them so external import paths
(decnet.web.db.sqlmodel_repo._safe_session) keep resolving.
2026-04-28 14:38:26 -04:00
4167345d51 refactor(db): convert sqlmodel_repo.py to a package
Pure rename — the old monolithic 3505-line file becomes
decnet/web/db/sqlmodel_repo/__init__.py. No code changes.
Subsequent commits will extract per-domain mixins out of __init__.py
to mirror the topical layout used by decnet/web/db/models/.
2026-04-28 14:37:18 -04:00
72cc928ebf feat(prober-cert): roll up fingerprints onto AttackerIdentity
Brings the federation-gossip columns on AttackerIdentity to life —
ja3_hashes, hassh_hashes, and the new tls_cert_sha256 — by projecting
the union of every member observation's fingerprints JSON onto the
identity at clusterer create / link / merge time.

- decnet/profiler/identity_rollup.py: pure extract_fp_summaries()
  reads the production bounty shape (payload.fingerprint_type +
  payload.{ja3,hash,cert_sha256}) and returns deduped+sorted JSON
  list[str] per family, or None when a family has no signal so the
  column stays NULL instead of '[]'.
- BaseRepository.update_identity_fingerprints + SQLModel impl: one
  idempotent write that overwrites the three summary columns and
  bumps updated_at.
- ConnectedComponentsClusterer: after every per-component
  reconciliation (fresh-create OR existing-merge+link), recomputes
  and writes the rollup for the target identity. Wrapped in a
  best-effort helper so a write failure logs but never breaks the
  tick.
- Tests: extract_fp_summaries unit (dedup, sort determinism,
  unknown types ignored, malformed JSON, nested-stringified
  payloads, non-string values); end-to-end clusterer ticks
  populate the columns on create + on later observation links;
  no-fingerprint clusters keep the columns NULL.
2026-04-28 11:28:54 -04:00
4749c972e5 feat(prober-cert): schema for active TLS cert capture
Adds storage for TLS certificate details collected from attacker-run
servers by the active prober (sibling to the existing JARM probe).

- AttackerIdentity.tls_cert_sha256 / Campaign.tls_cert_sha256:
  JSON list[str] columns mirroring ja3_hashes / hassh_hashes for
  federation gossip.
- ingester clause 9b: emits a 'tls_certificate' fingerprint bounty
  when a prober event carries subject_cn (disjoint from the existing
  sniffer-gated clause).
- Prober-side capture (ssl.wrap_socket follow-up after JARM) and
  profiler rollup land in sibling commits.
2026-04-28 11:09:25 -04:00
2cc60bd677 feat(realism): operator-tunable planner weights via realism_config
New realism_config table (uuid PK + unique key) + two repo methods
(get/set) backs an admin-only GET/PUT /api/v1/realism/config surface.

The planner now exposes apply_payload(payload) / current_payload() /
reset_to_defaults() and reads its weights through mutable module
globals; pick() resolves the live values each call. Validation
catches negative weights, zero totals, out-of-range canary_probability,
unknown content_class names, and silently drops cross-list entries
(canary class on the user list, etc).

The orchestrator worker calls _refresh_realism_config(repo) on
startup and every 5 ticks (~5min at 60s interval). Operator changes
land within one refresh window with no bus signal — the simpler path
for a knob whose latency tolerance is minutes.
2026-04-27 18:00:08 -04:00
da3c35c6a4 fix(realism): synthetic_files path fits MySQL utf8mb4 index cap
The (decky_uuid VARCHAR(64), path VARCHAR(1024)) UNIQUE constraint
generated a 4352-byte composite key under utf8mb4 (4 bytes/char),
busting MySQL's 3072-byte cap and crashing decnet api on init with:

    Specified key was too long; max key length is 3072 bytes

Tighten path to VARCHAR(512) — (64+512)*4 = 2304 bytes, well under
the cap. Real realism + canary placement paths are short
(/home/<persona>/Documents/<file>, ~70 chars); 512 keeps headroom
without the index hassle. Pre-v1, no migration helper.

Adds a regression test pinning the (decky_uuid + path) byte budget so
a future widening fails loudly in CI rather than at MySQL deploy
time.
2026-04-27 17:55:35 -04:00
87cb61c8b2 feat(realism): synthetic-files browser API
Adds GET /api/v1/realism/synthetic-files (paginated list, filters by
decky_uuid, persona, content_class) and
GET /api/v1/realism/synthetic-files/{uuid} (single row with last_body
and a truncated:bool flag set when the stored body is at the 64KB cap).

Repo gains count_synthetic_files() and get_synthetic_file(uuid). The
list view drops last_body to keep the wire payload bounded; the detail
endpoint is the only path that returns it. Read-only — orchestrator
remains the sole writer.
2026-04-27 17:44:53 -04:00
7e9bc6d49a refactor(realism): enforce synthetic_files 64KB cap at the repo
The orchestrator worker clipped last_body at write time, but the repo
didn't enforce. A future caller that forgot the clip would write the
full body. Move the clip to record_synthetic_file and
update_synthetic_file via SYNTHETIC_FILE_BODY_LIMIT in
decnet/web/db/models/realism.py. Worker now passes the full body and
trusts the repo. Tests retargeted to assert repo enforcement.
2026-04-27 17:37:36 -04:00
cb1872c52f feat(realism): synthetic_files table + planner wiring + scheduler swap
Stage 3 of the realism migration. Replaces orchestrator/scheduler.py's
hardcoded _FILE_TEMPLATES/_USERS (3 templates emitting epoch-suffixed
filenames like notes-1777315854.txt with identical bodies per
template) with a persona-driven realism engine.

New surface:

- SyntheticFile SQLModel (synthetic_files table, UNIQUE on
  decky_uuid+path) — per-(decky, path) state for the future
  edit-in-place flow. Pre-v1, no _migrate_* helper.
- BaseRepository methods: record_synthetic_file,
  update_synthetic_file, list_synthetic_files,
  pick_random_synthetic_file_for_edit (used by stage 3b).
- realism/naming.py: per-content-class filename templates,
  persona-conditioned. /var/log/cron.log + logrotate skeleton for
  system-class; /home/<persona>/TODO.md, scratch.md, etc. for
  user-class. Anti-regression test pins "no 8+ digit decimals in
  basenames" (the realism failure today).
- realism/bodies.py: deterministic body templates per content_class.
  TODO body uses checkbox markdown, script body has a shebang, cron
  body matches syslog cron shape ("CRON[PID]: (user) CMD (...)").
- realism/planner.py: pick(deckies, now, rng) returns a Plan.
  Diurnal-gated, weighted user/system content split (70/30 user
  bias). Create-only in stage 3; edit branch lands in stage 3b.

Scheduler split:

- scheduler.pick is now traffic-only (sync).
- scheduler.pick_file is async, takes a repo, resolves personas
  (Topology.email_personas for topology-source deckies; global
  realism.personas_pool otherwise), and maps Plan -> FileAction.
- FileAction gains persona/content_class/mtime fields.

Worker:

- _one_tick rolls 50/50 between traffic and file each tick. After a
  successful FileAction plant, _record_synthetic_file persists or
  patches the synthetic_files row (catching the unique-constraint
  collision on re-plant of the same path).
- SSHDriver._run_file passes action.mtime through to plant_file so
  files don't all stamp at wall-clock-now.
2026-04-27 16:22:07 -04:00
6a0d140e91 feat(db): canary token repository CRUD
Adds the abstract surface on BaseRepository and the SQLModel-backed
implementation (shared by SQLite and MySQL) for:

- canary blobs (upsert-by-sha256, list-with-refcount, refcount-aware delete)
- canary tokens (create, slug lookup, list with filters, state update)
- canary triggers (record+bump-counters atomically, list, attribute)

The triggers path is a single session that inserts the row and bumps the
parent token's counters together, so a subscriber that reads the token
right after the bus event sees the updated count. Blob delete refuses
while any token (including revoked) still references the blob; pre-v1
revoked tokens stick around for forensic value.
2026-04-27 12:48:24 -04:00
813f14bf2a feat(db): canary token tables (blob/token/trigger)
Three new tables for the canary tokens feature:

- canary_blobs       — operator-uploaded source artifacts, deduped by sha256
- canary_tokens      — one planted artifact in one decky; carries the
                       callback slug, generator/instrumenter, and lifecycle
- canary_triggers    — append-only log of every callback hit; attacker_id
                       back-filled by the correlator

Pydantic request/response shapes live in the same file per the
single-source-of-truth convention. No migrations file — pre-v1
SQLModel.metadata.create_all() covers it.
2026-04-27 12:45:41 -04:00
4badc75fb2 feat(emailgen): global persona pool + Date-stamped EML mtimes
Two changes that unwind earlier MazeNET-only assumptions and fix a
realism tell:

1. Persona resolution is now per-decky-source, not topology-only.  The
   scheduler walks the union view (list_running_deckies, including
   fleet MACVLAN/IPVLAN + SWARM shards) and picks the right persona
   list for each source:
     * topology decky -> Topology.email_personas (per-topology richness
       preserved)
     * fleet / shard  -> a single host-wide pool loaded from disk
       (DECNET_EMAILGEN_PERSONAS, /etc/decnet/email_personas.json, or
       ~/.decnet/email_personas.json)
   Operators install the global pool via 'decnet emailgen
   import-personas <file>' which validates with the same Pydantic
   schema the worker uses.

2. The driver now runs 'touch -d <Date>' inside the docker exec right
   after the EML write so file mtime matches the email's RFC 2822
   Date: header.  Without this an attacker 'ls -lt'ing the spool sees
   every email clustered inside the worker's tick window — the
   cluster itself was a stylometric tell.

CLI now exposes 'decnet emailgen' as a sub-app with 'run' (default,
backwards-compatible with bare 'decnet emailgen') and 'import-personas'.
list_running_deckies carries topology_id through so consumers can resolve
the parent topology without a second round-trip.
2026-04-26 22:39:16 -04:00
3ee55ec341 feat(emailgen): Ollama-driven fake email worker for IMAP/POP3 deckies
Second orchestrator worker (decnet emailgen) that drips persona-driven,
threaded, multi-language fake emails into running mail deckies.  Personas
live on Topology.email_personas; topology-wide language_default falls
through to any persona that doesn't pin its own.  Em-dashes are
suppressed at the prompt layer by default and only lifted for personas
explicitly marked uses_llms_heavily — em-dashes are an LLM tell and a
flat corpus of em-dashed mail is a giveaway.

EML delivery writes into /var/spool/decnet-emails/<thread>/<msg>.eml on
the mail decky via docker exec; wiring the IMAP/POP3 templates to read
from that spool (replacing the hardcoded _BAIT_EMAILS) is the next step.
2026-04-26 22:16:19 -04:00
9650366d34 fix(orchestrator): drop topology_deckies FK on event src/dst columns
Once the orchestrator started seeing fleet + SWARM shard sources via
list_running_deckies (a844148), every event row landing on a fleet decky
broke the FK to topology_deckies — the column now carries opaque ids
("local:omega-decky" for fleet, "host_uuid:decky_name" for shards) that
will never match topology_deckies.uuid.

Symptom on the operator's mothership:
  IntegrityError 1452 — orchestrator_events_ibfk_2 FK violated on every
  tick once the reconciler populated fleet_deckies.

Index on dst_decky_uuid is preserved (the dashboard reads
"events for this decky" frequently); only the FK is removed.  Keeps
data integrity loose by design — events are append-only history that
should outlive the deckies they reference.

Existing MySQL deployments need the FK dropped manually:
  ALTER TABLE orchestrator_events
    DROP FOREIGN KEY orchestrator_events_ibfk_2,
    DROP FOREIGN KEY orchestrator_events_ibfk_1;

SQLite users are unaffected — SQLite doesn't enforce FKs by default.
2026-04-26 21:40:06 -04:00
095500ae9a feat(db): FleetDecky table mirrors decnet-state.json into the DB
Adds a fleet_deckies table so DB-only consumers (orchestrator, web
dashboard, REST API) can see unihost / MACVLAN / IPVLAN deckies
without reading the JSON state file. Mirrors DeckyShard field-for-field.

Composite PK (host_uuid, name) future-proofs for a mothership that
runs both a local fleet and acts as a swarm master. host_uuid defaults
to the "local" sentinel — no FK to swarm_hosts because the local
mothership isn't enrolled as a worker.

Repo additions: upsert_fleet_decky, delete_fleet_decky,
list_fleet_deckies, list_running_fleet_deckies,
update_fleet_decky_state, plus list_running_deckies which unions
topology + fleet + shard sources for the orchestrator.

Smoke-tested round-trip against MySQL: upsert, list_running, union
view (source="fleet"), delete.
2026-04-26 21:00:01 -04:00
5b5ff54fa2 feat(web): orchestrator events read API + SSE stream
GET /api/v1/orchestrator/events — paginated list with optional
kind=traffic|file filter. GET /api/v1/orchestrator/events/stream —
SSE: snapshot on connect, live forward of orchestrator.> bus events
mapped to 'traffic' / 'file' SSE event names.

Repo gains list_orchestrator_events(limit, offset, kind?, since_ts?),
count_orchestrator_events(kind?), and prune_orchestrator_events
(per_dst_cap=10000) for periodic worker-side trimming.
2026-04-26 19:58:12 -04:00
4c37ece39e feat(orchestrator): MVP synthetic life-injection worker (SSH only)
Adds a new decnet orchestrate worker whose job is to keep the honeypot
ecosystem from looking suspiciously static — a frozen LAN with no
inter-host traffic and no filesystem aging is its own honeypot tell.

MVP scope:
- New OrchestratorEvent table + repo methods (purpose-built sibling
  to Log so synthetic events stay separable from attacker-driven ones).
- New orchestrator.{activity,file}.<decky_id> bus topics +
  system.orchestrator.health heartbeat.
- SSH-only driver. Traffic action runs python3 inside src container
  to TCP-connect dst:22 and read the SSH banner — real on-the-wire
  SSH-protocol traffic without shipping creds. File action drops or
  refreshes a small file via docker exec on the destination.
- Random scheduler (50/50 traffic/file when >=2 SSH-capable deckies
  are running). Diurnal shaping, role-aware pairing, and session-aware
  backoff are explicit non-goals for MVP.
- CLI registration, systemd unit (SupplementaryGroups=docker),
  worker-registry entry so the dashboard shows orchestrator health.
- 11 tests: scheduler policy, driver argv shape + injection-safety,
  end-to-end one-tick integration with FakeBus + SQLite.
2026-04-26 19:43:20 -04:00
75af00c9c8 test(clustering): full-bound passes through production campaign clusterer
Runs the chained identity + campaign clustering pipeline against all
seven fixtures via from_synthetic / from_synthetic_identity adapters
and ratchets every YAML floor to 1.0 — the production clusterer
(and the reference clusterers used in the per-fixture tests) all
score perfectly across ARI / homogeneity / completeness /
singleton_recall on each fixture.

Three substrate fixes surfaced by the ratchet:

- Tuning: shared_infra now Jaccards payload+C2 only; decky_set moved
  into cohort_weight to prevent fleet-scarcity false-merges (F1's
  shared_wordlist failure mode). Tier weight raised to 1.0 so
  shared payload+C2 alone crosses threshold (F5's intended pass).
- Adapter: from_synthetic_identity now reads SyntheticSession
  started_at + duration_s for session_windows and per-decky
  timestamps (the production-row adapter still uses start_ts/end_ts
  when available).
- Fixture data: paused_campaign.yaml's JA3 collided exactly with
  vpn_hopping.yaml's (same TLS extension list). The collision
  fused two unrelated campaigns under the chained identity layer
  in the noise_floor composite. Made paused's JA3 distinct.

Also wires Campaign / CampaignsResponse into models/__init__.py's
__all__ that was missed in the schema commit.
2026-04-26 09:13:59 -04:00
0a1cf65ddb feat(db): Campaign SQLModel + repo write/read methods
Adds the campaigns table and the BaseRepository / SQLModelRepository
methods that the campaign-clusterer worker (next commit) needs to
populate it. Mirrors the AttackerIdentity layer: schema_version from
day one for federation gossip, soft-merge via merged_into_uuid with a
chain-walking get_campaign_by_uuid, list_campaigns excluding merged-
out rows while list_all_campaigns returns the unfiltered set for the
revoke pass. attacker_identities.campaign_id gets a real FK now that
the target table exists.
2026-04-26 08:54:28 -04:00
e364ef8859 feat(clustering): revocable merges (merge + unmerge)
Reworks the clusterer's tick to handle multi-identity components and
re-evaluate prior merges. Two passes per tick:

Pass 1 — per-component reconciliation:
  * Fresh component → mint identity (commit 4 path).
  * Single-identity component → link unassigned observations.
  * Multi-identity component → soft-merge: pick the smallest-uuid
    winner deterministically, set merged_into_uuid on each loser,
    link unassigned observations to the winner. Observations stay
    FK'd to their original identity row — the merge is a soft
    pointer, not a re-point. Audit trail preserved; cached
    subscribers resolve through the chain.

Pass 2 — revocable-merge undo:
  * For each merged-out identity, check whether its observations
    still cluster with its winner's. If not, the merge is
    contradicted by new evidence — clear merged_into_uuid and emit
    identities_unmerged. The resurrected identity keeps its original
    uuid, so subscribers that cached it during the merged interval
    re-attach without a new lookup.

A pre-built merge-chain dict feeds Pass 1 so the effective-identity
lookup is O(1) per observation. The chain has a hop cap (paranoia
against accidental cycles in the underlying state).

Repo additions on BaseRepository + SQLModelRepository:
  * list_all_identities() — includes merged-out rows.
  * update_identity_merged_into(uuid, winner_or_None) — single
    setter for both merge and unmerge.
DummyRepo coverage stub updated.

Tests:
  * Two distinct identities bridged by a new observation merge with
    the smaller uuid as winner.
  * A pre-seeded soft-merge whose underlying observations diverge
    gets revoked; resurrected uuid emerges with merged_into_uuid
    cleared.
  * Tick is idempotent under no state changes.
2026-04-26 08:33:32 -04:00
de2f4c3a62 feat(clustering): wire high-weight edges end-to-end
The connected-components clusterer now writes attacker_identities
rows + sets attackers.identity_id when high-weight signals (JA3 /
HASSH / payload-hash / C2-endpoint exact match) agree across
observations. Singletons stay un-fingerprinted and un-clustered.

Algorithm split:
- cluster_observations(observations) — pure union-find over the
  high-weight edge function. Same code path for fixture validation
  and production tick.
- from_attacker_row(row) — production-row adapter; recovers JA3 +
  HASSH from Attacker.fingerprints JSON. Payload + C2 join from
  logs in later commits; the function shape doesn't change.

Repo additions on BaseRepository + SQLModelRepository:
- list_attackers_for_clustering(limit=None)
- create_attacker_identity(row)
- set_attacker_identity_id(attacker_uuid, identity_uuid)
DummyRepo coverage stub updated.

v1 behavior is conservative: only assigns identities to observations
whose identity_id is currently NULL. Multi-identity components are
skipped this pass — merge / re-assign lands in commit 10 with
revocable merges.

Fixture bounds tightened against the production clusterer:
- lone_wolf (F3) — singletons stay singletons
- shared_wordlist (F1) — credential-only overlap doesn't cluster
  (high-weight tier doesn't include credentials)
- vpn_hopping (F2, identity-level) — 5 rotated IPs with stable JA3
  + HASSH fold into one identity, ARI = 1.0, completeness = 1.0
2026-04-26 08:19:56 -04:00
dc3d08dd41 feat(web): read-only /api/v1/identities/* endpoints + repo methods
Second of the five-step identity-resolution substrate. Ships the API
surface against the empty AttackerIdentity table from commit 1 — every
endpoint returns empty/404 cleanly until the clusterer populates rows.

Routes (auth-gated, viewer role):
* GET /api/v1/identities — paginated list, excludes merged-out rows
* GET /api/v1/identities/{uuid} — detail; transparently follows
  merged_into_uuid to surface the canonical winner
* GET /api/v1/identities/{uuid}/observations — Attacker rows FK'd
  to the (resolved) identity uuid

Repository (BaseRepository abstract + SQLModelRepository concrete):
* get_identity_by_uuid (with merge-chain following, hop-bounded)
* list_identities / count_identities (excluding merged-out)
* list_observations_for_identity / count_observations_for_identity

Tests: 12 new (empty-table behavior, seeded data, merge-chain
resolution, repo-level smoke against real SQLite). Also fixes the
pre-existing test_base_repo_coverage failure (DEBT-041 added abstract
methods without updating the DummyRepo stub) — included here because
this PR adds 5 more abstract methods, fixing it as a bonus.

474 db/web/profiler/correlation tests green.
2026-04-26 07:08:55 -04:00
84c1ca9c9b feat(identity): AttackerIdentity table + nullable attackers.identity_id FK
Schema-only commit, first of the five-step substrate for identity
resolution. The clusterer that populates identities lands later; this
ships the table empty and the FK uniformly NULL on existing rows.

* decnet/web/db/models/attackers.py — new AttackerIdentity SQLModel
  (uuid PK, schema_version, fingerprint summary lists, kd_digraph_simhash,
  merged_into_uuid self-FK, all clusterer-populated fields nullable).
  Attacker grows a nullable indexed identity_id FK + docstring marking
  it as the per-IP observation row.
* decnet/web/db/models/__init__.py — re-exports AttackerIdentity.
* tests/db/test_identity_schema.py — 9 schema invariants: table exists,
  identity_id nullable + indexed, FK targets attacker_identities.uuid,
  schema_version defaults to 1, attacker rows inserted with NULL
  identity_id, FK constraint blocks orphans.

463 unrelated db/web/profiler/correlation tests still green. See
development/IDENTITY_RESOLUTION.md for the full design.
2026-04-26 07:00:24 -04:00
3eb67c9400 refactor(intel): re-key attacker_intel on attacker_uuid (closes DEBT-041)
The threat-intel surface was IP-keyed on day one as an expedient — the
worker is woken by IP-bearing bus events. ANTI's call: don't carry that
debt. NO IPs as primary keys anywhere on the attacker-intel surface.

Schema:
- attacker_uuid is now the canonical key — UNIQUE + FK to attackers.uuid.
- attacker_ip stays as a denormalised, indexed, NON-UNIQUE value column.
  Updated on every upsert; useful for SIEM payloads and audit lookups,
  but explicitly NOT a key. Model docstring says so.
- Pre-v1, no Alembic migration needed. SQLModel.metadata.create_all()
  builds the new shape on fresh DBs.

Repo:
- upsert_attacker_intel now keys on attacker_uuid.
- get_attacker_intel_by_ip → get_attacker_intel_by_uuid.
- get_unenriched_attacker_ips → get_unenriched_attackers, returning
  [{uuid, ip}] tuples so the worker writes by UUID and dispatches
  provider calls by IP without a second round-trip.

Worker:
- _enrich_one(uuid, ip, ...) — UUID lands on the row, IP rides for
  provider egress.
- attacker.intel.enriched bus payload gains attacker_uuid alongside
  attacker_ip — webhook → SIEM consumers benefit; no removal.

API:
- GET /api/v1/attackers/{ip}/intel deleted outright (rip-and-replace,
  never deployed beyond dev).
- GET /api/v1/attackers/{uuid}/intel is the only public route, matching
  every other /attackers/* route.

Frontend:
- <IntelPanel uuid={id!} /> uses the URL param directly, fetches in
  parallel with the rest of AttackerDetail rather than waiting on
  attacker.ip.

Tests: re-keyed in place, 39 passed (same coverage as before the
refactor). Provider-impl tests untouched.

DEBT-041: closed in DEBT.md (entry preserved as historical rationale,
summary table flipped to , remaining-open list shortened by one).
2026-04-26 05:35:29 -04:00
0dd3811436 feat(intel): attacker_intel table + repo helpers
New TTL-cached threat-intel row keyed by attacker IP, with per-provider
verdict/raw/queried_at columns for GreyNoise, AbuseIPDB, abuse.ch Feodo
Tracker and ThreatFox. Carries schema_version from day one (federation
wire-format precedent set by SessionProfile). Repo gains
upsert_attacker_intel, get_attacker_intel_by_ip, and a
get_unenriched_attacker_ips backfill primitive that picks fresh + stale
rows for the forthcoming 'decnet enrich' worker.

Also documents the open-source intel-source backlog in DEVELOPMENT_V2.
2026-04-26 04:56:47 -04:00
50870f2e7a feat(creds): surface plaintext/b64 secret on reuse findings
The CredentialReuse table only stores the sha256+kind hash of the
secret; the printable + b64 forms live on the underlying Credential
rows. The dashboard drawer was therefore showing only the hash, which
defeats most of the value of having a reuse view in the first place.

Repo helpers list_credential_reuses + get_credential_reuse_by_id now
issue one batched SELECT against credentials keyed on the sha256s in
the result page and graft secret_printable + secret_b64 onto each row
before returning. The drawer renders the same printable/b64 code-block
the credentials inspector uses.
2026-04-26 04:34:19 -04:00
590c2b0fac feat(correlation): credential-reuse engine + reuse-correlate worker
Adds CorrelationEngine.correlate_credential_reuse + the
`decnet reuse-correlate` long-running worker. The worker mirrors the
mutator's bus-wake + slow-tick pattern: wakes on credential.captured
and attacker.observed for sub-second latency, falls back to a 60s
poll if the bus is unavailable, and publishes
credential.reuse.detected once per new or grown CredentialReuse row
(group-deduped so a 5-cred reuse doesn't emit 5 partial events).

The web ingester now publishes credential.captured after every
successful Credential upsert; bus + new repo helper
find_credential_reuse_candidates feed the engine pass.
2026-04-26 03:37:49 -04:00
ce4be68501 feat(creds): cred-reuse foundation + vectorstore scaffold
Lays the storage and bus substrate for the "credential reuse patterns"
task in DEVELOPMENT.md and scaffolds decnet/vectorstore/ as the future
substrate for statistical attacker re-identification over behavioral
fingerprints. No correlator, profiler, API, or dashboard wiring in
this commit — see TODO.md for the handoff.

Schema:
  - Credential.attacker_uuid (nullable FK to attackers.uuid),
    backfilled by the profiler post-write to avoid coupling the
    capture path to the profiler's ordering.
  - CredentialReuse table — UUID PK, JSON list columns for the
    accumulating attacker_uuids/ips/deckies/services, target_count
    (the discriminative scalar), confidence reserved for a future
    fuzzy-credential pass.

Repo:
  - upsert_credential_reuse / list_credential_reuses /
    get_credential_reuse_by_id / update_credential_attacker_uuid.
  - Renamed pre-existing get_credential_reuse(secret_sha256) to
    get_credential_attempts_for_secret(secret_sha256) — the new
    findings table needs the cleaner name.

Bus topics:
  - credential.captured (one per Credential upsert)
  - credential.reuse.detected (correlator-emitted on insert/grow)

Vectorstore subpackage (decnet/vectorstore/, flat layout mirroring
decnet/bus/):
  - BaseVectorStore ABC keyed by (kind, id) — kind discriminator
    means new feature families are additive, no schema migration.
  - FakeVectorStore (in-memory L2 KNN), NullVectorStore (no-op for
    DECNET_VECTORSTORE_ENABLED=false), SqliteVecVectorStore (lazy
    sqlite_vec extension load, one vec0 virtual table per kind).
  - get_vectorstore() env-driven dispatch with graceful fallback
    to FakeVectorStore when the sqlite-vec extension isn't on the
    host, so workers don't crash on a missing optional dep.

Tests: 26 new (11 cred-reuse repo, 15 vectorstore). Existing
credentials and base-repo tests updated for the rename. Total: 34
passing on the touched files.
2026-04-26 03:18:34 -04:00
6b16c844b6 fix(creds): MQTT regression + secret_kind for hash credentials
Honest correction to the "every cred-emitting service" claim. Audit
of templates/* found three gaps:

1. MQTT — was working through the legacy adapter, silently dropped
   when Phase 3 (e696c2b) deleted it. Now migrated to encode_secret()
   alongside the others.
2. Postgres — `auth, pw_hash=…` event captures the MD5
   challenge-response the attacker sent. Plaintext irrecoverable, so
   it never fit the (principal, secret_b64=raw_bytes) shape. Lands
   in Credential as secret_kind="postgres_md5_challenge".
3. VNC — `auth_response, response=…hex` event captures the 16-byte
   DES-encrypted challenge. Same situation as Postgres: plaintext
   irrecoverable. Lands as secret_kind="vnc_des_response".

Adds a `secret_kind` discriminator column to Credential (default
"plaintext", indexed). The dedup tuple gains secret_kind so two
credentials with the same sha256 but different kinds are
fundamentally different rows — different challenges produce
different bytes for the same plaintext password, so cross-kind
reuse matches are meaningless and would only confuse analytics.

The model now genuinely covers every cred-emitting service in the
fleet:

  plaintext        SSH, Telnet, FTP, POP3, IMAP, SMTP, Redis, LDAP,
                   MQTT
  postgres_md5_*   Postgres
  vnc_des_response VNC

Username-only services (MySQL/MSSQL — TDS pre-encryption captures
the user but never sees the password byte) intentionally don't feed
Credential — they're recon signals, not cred attempts.

40 tests pass in the touched scope. New cases: secret_kind dedups
independently in the repo; Postgres MD5 + VNC DES emitters thread
through; MQTT round-trips through the native branch.
2026-04-25 06:16:57 -04:00
2f47f67eef feat(creds): future-proof Credential storage model
Replaces the opaque Bounty.bounty_type='credential' path with a
dedicated `credentials` table whose schema is forward-compatible
across every auth-bearing service in the fleet. Hoisted indexed
columns (secret_sha256, principal, service, attacker_ip) carry the
universal reuse-analytics signal; service-specific JSON keys ride
in `fields`. Cross-service reuse queries become an indexed lookup
on secret_sha256 instead of JSON_EXTRACT scans.

Schema decisions baked in (per ANTI):
- New `Credential` table, not extension to Bounty
- Hoisted `principal` column for cross-service principal-reuse
- Standardized JSON keys: every payload carries secret_b64 +
  secret_printable + principal universally; service-specific extras
  (user, domain, dn, mech, …) ride alongside

The auth-helper SD-block emits the new shape natively. The ingester
forks at _extract_bounty:
- Native shape (SSH/Telnet, future emitters): secret_b64 present →
  direct upsert_credential
- Legacy shape (FTP/POP3/IMAP/SMTP today): username + password →
  adapter synthesizes secret_{b64,sha256,printable} on the fly,
  upserts into the same Credential table. Tracked as DEBT-039;
  one-shot bridge until those service templates migrate.

Defense-in-depth across five layers (input validation):
- C helper: bytes outside [0x20, 0x7f) collapse to '?', RFC 5424
  escape rules for \\, ", ]; b64 preserves exact bytes
- Ingester native branch: rejects malformed secret_b64 (regex), drops
  the credential row but keeps the underlying Log
- Ingester legacy adapter: same printable-ASCII filter as the C
  code; sha256 + b64 over the original utf-8 bytes (lossless, even
  when secret_printable is sanitized)
- DB column caps with truncation warning; sha256 always over the
  full pre-truncation bytes so reuse queries match across truncation
- JSON serialized with ensure_ascii=True so utf8mb4 columns stay
  safe even with non-ASCII service-specific keys

Bounty.bounty_type='credential' is no longer written. Pre-v1: no
historical backfill; existing rows stay untouched but unused.

595 tests pass; new tests cover the model + repo (upsert dedup,
null-principal independence, cross-service reuse, filters), both
ingester branches, b64 validation, sanitization preserving the
fingerprinting signal in b64.
2026-04-25 05:29:26 -04:00
bcf460d2a5 feat(profiler): write ASN + AS name onto attacker rows
Adds asn (int), as_name (varchar 128), asn_source (varchar 16) to
the Attacker SQLModel — direct columns, no _migrate_* helper per
feedback_no_new_migrations_prev1.

Profiler worker now calls decnet.asn.enrich_ip alongside the existing
geoip enrich_ip; both feed the upsert payload. Failure is total — if
either lookup throws or the IP is private/unannounced, the field stays
None and the row still writes.

Both lookups are independent: a CGNAT address can have a country (RIR
allocation) but no ASN (no BGP origin), and vice-versa for unrouted
RIR-allocated space. Storing them separately preserves that signal.
2026-04-25 04:01:28 -04:00
ee176a6f79 Revert "feat(mazenet): per-LAN swarm host pin"
This reverts commit 0d92170a57.
2026-04-25 03:26:19 -04:00