Files
DECNET/TODO.md
anti ce4be68501 feat(creds): cred-reuse foundation + vectorstore scaffold
Lays the storage and bus substrate for the "credential reuse patterns"
task in DEVELOPMENT.md and scaffolds decnet/vectorstore/ as the future
substrate for statistical attacker re-identification over behavioral
fingerprints. No correlator, profiler, API, or dashboard wiring in
this commit — see TODO.md for the handoff.

Schema:
  - Credential.attacker_uuid (nullable FK to attackers.uuid),
    backfilled by the profiler post-write to avoid coupling the
    capture path to the profiler's ordering.
  - CredentialReuse table — UUID PK, JSON list columns for the
    accumulating attacker_uuids/ips/deckies/services, target_count
    (the discriminative scalar), confidence reserved for a future
    fuzzy-credential pass.

Repo:
  - upsert_credential_reuse / list_credential_reuses /
    get_credential_reuse_by_id / update_credential_attacker_uuid.
  - Renamed pre-existing get_credential_reuse(secret_sha256) to
    get_credential_attempts_for_secret(secret_sha256) — the new
    findings table needs the cleaner name.

Bus topics:
  - credential.captured (one per Credential upsert)
  - credential.reuse.detected (correlator-emitted on insert/grow)

Vectorstore subpackage (decnet/vectorstore/, flat layout mirroring
decnet/bus/):
  - BaseVectorStore ABC keyed by (kind, id) — kind discriminator
    means new feature families are additive, no schema migration.
  - FakeVectorStore (in-memory L2 KNN), NullVectorStore (no-op for
    DECNET_VECTORSTORE_ENABLED=false), SqliteVecVectorStore (lazy
    sqlite_vec extension load, one vec0 virtual table per kind).
  - get_vectorstore() env-driven dispatch with graceful fallback
    to FakeVectorStore when the sqlite-vec extension isn't on the
    host, so workers don't crash on a missing optional dep.

Tests: 26 new (11 cred-reuse repo, 15 vectorstore). Existing
credentials and base-repo tests updated for the rename. Total: 34
passing on the touched files.
2026-04-26 03:18:34 -04:00

9.9 KiB
Raw Blame History

TODO — credential reuse + vectorstore (handoff)

This document hands off in-progress work on the credential reuse patterns task from development/DEVELOPMENT.md (under Service-Level Behavioral Profiling) plus the decnet/vectorstore/ scaffolding that prepares the substrate for a future statistical re-identification engine over behavioral fingerprints. See /home/anti/.claude/plans/ah-excellent-alright-claude-vivid-thimble.md for the full approved plan and motivation.

Done in the previous session

Foundation is shipped + tested (26 new tests passing, no regressions):

  • Schemadecnet/web/db/models/logs.py
    • Credential.attacker_uuid: Optional[str] FK to attackers.uuid, nullable. Backfilled by the profiler post-write.
    • CredentialReuse table (UUID PK; JSON list columns for attacker_uuids, attacker_ips, deckies, services; target_count, attempt_count, confidence reserved for future fuzzy matching). Unique key: (secret_sha256, secret_kind, principal_key).
    • CredentialReuseResponse Pydantic DTO.
  • Repodecnet/web/db/sqlmodel_repo.py + repository.py
    • upsert_credential_reuse(...), list_credential_reuses(limit, offset, min_target_count, secret_kind), get_credential_reuse_by_id(id), update_credential_attacker_uuid(attacker_ip, attacker_uuid) -> int.
    • Rename: pre-existing get_credential_reuse(secret_sha256)get_credential_attempts_for_secret(secret_sha256). All callers updated.
  • Bus topicsdecnet/bus/topics.py
    • CREDENTIAL_CAPTURED = "captured" (one per Credential upsert).
    • CREDENTIAL_REUSE_DETECTED = "reuse.detected" (correlator emits on insert/grow).
    • credential(event_type) builder.
  • Vectorstoredecnet/vectorstore/ (NEW; flat layout mirroring decnet/bus/)
    • base.pyBaseVectorStore ABC, VectorRecord, Neighbor, VECTORSTORE_SCHEMA_VERSION. Methods: initialize, close, health, insert, get, delete, knn. Keyed by (kind, id).
    • fake.pyFakeVectorStore (in-memory, brute-force L2 KNN) + NullVectorStore (no-op when DECNET_VECTORSTORE_ENABLED=false).
    • sqlite_vec.pySqliteVecVectorStore; lazy-loads the sqlite_vec extension; one vec0 virtual table per kind so new feature families don't require schema migration. Per-kind dim is locked on first insert.
    • factory.pyget_vectorstore() env-driven dispatch (DECNET_VECTORSTORE_TYPE ∈ {sqlite_vec, fake}; DECNET_VECTORSTORE_ENABLED; DECNET_VECTORSTORE_PATH). On missing sqlite_vec extension: logs a warning and returns FakeVectorStore so workers don't crash.
  • Tests
    • tests/db/test_credential_reuse.py — 11 tests (upsert idempotency, list filters/pagination, FK backfill semantics, null-principal uniqueness, JSON-list merging).
    • tests/vectorstore/test_factory.py (6) + tests/vectorstore/test_fake.py (9) — factory dispatch + fallback, round-trip, dim-mismatch raises, KNN ordering, NullStore no-op.
    • Updated tests/db/test_base_repo.py and tests/db/test_credentials.py for the rename.

Not yet done — what the next agent should pick up

Tasks below are roughly in dependency order. Backend first, dashboard last (it's the largest unknown and benefits from a fresh context).

1. Profiler backfill of Credential.attacker_uuid

Smallest task; do this first to validate the FK column end-to-end.

  • File: decnet/profiler/ — find the spot where the profiler mints/updates an Attacker row from observed events. There's likely an upsert_attacker(...) call that produces the (ip, uuid) pair.
  • Add immediately after a successful upsert:
    await repo.update_credential_attacker_uuid(ip, uuid)
    
  • Test in tests/profiler/ (whatever the existing test file is) that after the profiler processes events for an IP, all Credential rows for that IP have their attacker_uuid populated. Use the pattern from tests/db/test_credential_reuse.py:: test_update_credential_attacker_uuid_backfills_only_nulls.

2. Correlator engine + worker wiring

  • File: decnet/correlation/engine.py — add correlate_credential_reuse(min_targets: int = 2) to CorrelationEngine. Signature suggested in the plan:
    SELECT secret_sha256, secret_kind, principal,
           COUNT(DISTINCT decky_name||':'||service) AS target_count
    FROM credentials
    GROUP BY secret_sha256, secret_kind, principal
    HAVING target_count >= :min_targets
    
    For each group, fetch the underlying credential rows and call repo.upsert_credential_reuse(...) per row. The repo upsert recomputes target_count from the credentials table on each update, so you don't need to pass aggregates in.
  • On insert/grow (out["inserted"] is True or out["changed"] is True), publish bus.publish(topics.credential(topics.CREDENTIAL_REUSE_DETECTED), {...}) with payload {id, secret_kind, target_count, attacker_uuids, attacker_ips, deckies, services}.
  • Worker file: decnet/correlation/main.py (or wherever CorrelationEngine is loop-driven). Subscribe to:
    • attacker.observed — re-runs reuse pass for that IP.
    • credential.captured — re-runs reuse pass for that secret.
    • Heartbeat tick every 60s as a fallback (mirror the mutator's bus-wake + slow-tick pattern).
  • Where is credential.captured emitted? Find the credential ingest path — probably decnet/collector/ or wherever repo.upsert_credential(...) is called. Add a bus.publish( topics.credential(topics.CREDENTIAL_CAPTURED), {secret_sha256, secret_kind, attacker_ip, decky, service}) after a successful upsert. Bus is fire-and-forget — don't block on it.
  • Tests:
    • tests/correlation/test_credential_reuse.py — engine emits the right CredentialReuse rows from synthetic credentials; bus event published exactly once per insert/grow.
    • Use decnet.bus.fake.FakeBus in tests; collect published events for assertion.

3. API routes — GET /api/v1/credential-reuse

  • File: probably decnet/web/api/routes/ — see how existing credentials routes are organized (recent commit feat(api): GET /credentials endpoint4566146).
  • Endpoints:
    • GET /api/v1/credential-reuse?limit=50&offset=0&min_target_count=2&secret_kind=plaintextCredentialReuseResponse (already in models).
    • GET /api/v1/credential-reuse/{id} → single row dict, 404 if missing.
  • JWT-gated like all other routes. Use the existing dependency.
  • No POST/PUT/PATCH — read-only this release. Per the feedback_schemathesis_400 memory there's no 400 contract to document since there's no body parsing.
  • Tests: tests/api/test_credential_reuse_routes.py — JWT gate, pagination, filters, 404 for missing id.

4. Dashboard — Credentials Reuse tab + drawer

The big unknown. Next agent should:

  1. Survey decnet/web/dashboard/ (React app) — how the existing Credentials view is structured (commit 4ea4b0b feat(web): Credentials view + inspector).
  2. Add a "Reuse" tab/filter that lists CredentialReuse rows sorted by target_count desc.
  3. Drawer on row-click showing decky×service breakdown, attacker_uuid list (link to /attackers/:id), timeline. Reuse the existing drawer pattern (see feedback_react_stop_propagation_native_delegation memory — backdrop click closes via target===currentTarget, never stopPropagation).
  4. On the existing Credentials list, add a "seen on N targets" badge when a credential has a corresponding CredentialReuse row, so the connection is bidirectional.

5. DEVELOPMENT.md

Tick [x] Credential reuse patterns under Service-Level Behavioral Profiling. Add a one-liner under Attacker Intelligence Collection noting decnet/vectorstore/ is scaffolded for the future statistical re-ID engine (no behavioural change yet).

Architectural decisions worth knowing

These came out of the design conversation that produced the plan; the next agent should respect them:

  • Classical statistics, not ML, for attacker re-identification. Cosine/Mahalanobis/KS-test over per-kind feature vectors, weighted voting, versioned thresholds. Reproducible, explainable, no model drift. ML is reserved for a future advisory layer behind the factory, never primary.
  • Provider factory pattern is mandatory for any new pluggable backend (storage, transport, similarity). Mirror decnet/web/db/ and decnet/bus/ — never let workers import concrete backends.
  • kind discriminator is the extension point for new feature families. Adding kind="cmd_ngram" later does not require schema changes — the vec_<kind> table is created lazily on first insert.
  • Credential.attacker_uuid is nullable on write by design — the credential capture path runs before the profiler mints Attacker, so coupling them would create a chicken-and-egg ordering bug. The profiler backfills.
  • CredentialReuse.confidence is always 1.0 today (exact-secret match). The column exists so a future fuzzy-credential pass (hunter2hunter22) can write 0.x rows without schema work.

Verification checklist for the next agent

After finishing each chunk:

  • pytest tests/<area> --timeout=30 --timeout-method=thread — must be green before moving on.
  • Don't run fuzz/bench/live/stress in the dev loop (memory: feedback_skip_heavy_tests).
  • Don't pre-clear with custom bandit/ruff flags (memory: feedback_trust_git_hooks) — the pre-commit hook is authoritative.
  • Commit per task, not batched (memory: feedback_commit_per_task). Don't add Co-Authored-By to commit messages.

Open questions to surface to ANTI before tackling §4

  • Should the dashboard "Reuse" surface live as a tab on the existing Credentials page, or as a sibling page? (The plan said tab, but worth confirming once you've seen the code.)
  • Pagination size for the reuse list — match the existing Credentials view default, or use a smaller page since the rows are wider?