Lays the storage and bus substrate for the "credential reuse patterns"
task in DEVELOPMENT.md and scaffolds decnet/vectorstore/ as the future
substrate for statistical attacker re-identification over behavioral
fingerprints. No correlator, profiler, API, or dashboard wiring in
this commit — see TODO.md for the handoff.
Schema:
- Credential.attacker_uuid (nullable FK to attackers.uuid),
backfilled by the profiler post-write to avoid coupling the
capture path to the profiler's ordering.
- CredentialReuse table — UUID PK, JSON list columns for the
accumulating attacker_uuids/ips/deckies/services, target_count
(the discriminative scalar), confidence reserved for a future
fuzzy-credential pass.
Repo:
- upsert_credential_reuse / list_credential_reuses /
get_credential_reuse_by_id / update_credential_attacker_uuid.
- Renamed pre-existing get_credential_reuse(secret_sha256) to
get_credential_attempts_for_secret(secret_sha256) — the new
findings table needs the cleaner name.
Bus topics:
- credential.captured (one per Credential upsert)
- credential.reuse.detected (correlator-emitted on insert/grow)
Vectorstore subpackage (decnet/vectorstore/, flat layout mirroring
decnet/bus/):
- BaseVectorStore ABC keyed by (kind, id) — kind discriminator
means new feature families are additive, no schema migration.
- FakeVectorStore (in-memory L2 KNN), NullVectorStore (no-op for
DECNET_VECTORSTORE_ENABLED=false), SqliteVecVectorStore (lazy
sqlite_vec extension load, one vec0 virtual table per kind).
- get_vectorstore() env-driven dispatch with graceful fallback
to FakeVectorStore when the sqlite-vec extension isn't on the
host, so workers don't crash on a missing optional dep.
Tests: 26 new (11 cred-reuse repo, 15 vectorstore). Existing
credentials and base-repo tests updated for the rename. Total: 34
passing on the touched files.
74 lines
2.8 KiB
Python
74 lines
2.8 KiB
Python
"""Vectorstore factory — selects a :class:`BaseVectorStore` implementation.
|
|
|
|
Dispatch keys:
|
|
|
|
* ``DECNET_VECTORSTORE_ENABLED`` — ``"false"`` short-circuits to
|
|
:class:`~decnet.vectorstore.fake.NullVectorStore`. Default ``"true"``.
|
|
* ``DECNET_VECTORSTORE_TYPE`` — ``"sqlite_vec"`` (default) or
|
|
``"fake"``.
|
|
* ``DECNET_VECTORSTORE_PATH`` — sqlite file path. Defaults to
|
|
``/var/lib/decnet/vectors.sqlite`` if writable, else
|
|
``~/.decnet/vectors.sqlite``.
|
|
|
|
Mirrors :mod:`decnet.bus.factory` and :mod:`decnet.web.db.factory`:
|
|
lazy imports inside each branch, env-driven dispatch, callers MUST go
|
|
through :func:`get_vectorstore` rather than instantiating backends.
|
|
|
|
If ``sqlite_vec`` is requested but the extension is unavailable on
|
|
this host, the factory logs a warning and returns the fake backend
|
|
instead — the caller's code path stays valid (``insert`` no-ops, etc.)
|
|
without crashing the worker on a missing optional dependency.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import logging
|
|
import os
|
|
from typing import Any
|
|
|
|
from decnet.vectorstore.base import BaseVectorStore
|
|
|
|
LOG = logging.getLogger(__name__)
|
|
|
|
|
|
def get_vectorstore(**kwargs: Any) -> BaseVectorStore:
|
|
if os.environ.get("DECNET_VECTORSTORE_ENABLED", "true").lower() == "false":
|
|
from decnet.vectorstore.fake import NullVectorStore
|
|
return NullVectorStore()
|
|
|
|
backend = os.environ.get("DECNET_VECTORSTORE_TYPE", "sqlite_vec").lower()
|
|
|
|
if backend == "fake":
|
|
from decnet.vectorstore.fake import FakeVectorStore
|
|
return FakeVectorStore()
|
|
|
|
if backend == "sqlite_vec":
|
|
# Probe extension availability up front so the factory can fall
|
|
# back cleanly. Construction is cheap, but the extension load
|
|
# only happens in initialize(); without this probe the caller
|
|
# sees the failure too late to substitute a backend.
|
|
try:
|
|
import sqlite_vec # noqa: F401
|
|
except ImportError as e:
|
|
LOG.warning(
|
|
"sqlite_vec not installed (%s); falling back to FakeVectorStore. "
|
|
"Install the sqlite-vec package or set "
|
|
"DECNET_VECTORSTORE_TYPE=fake to silence this warning.", e,
|
|
)
|
|
from decnet.vectorstore.fake import FakeVectorStore
|
|
return FakeVectorStore()
|
|
from decnet.vectorstore.sqlite_vec import SqliteVecVectorStore
|
|
db_path = kwargs.pop("db_path", None) or _default_db_path()
|
|
return SqliteVecVectorStore(db_path=db_path)
|
|
|
|
raise ValueError(f"Unsupported vectorstore type: {backend}")
|
|
|
|
|
|
def _default_db_path() -> str:
|
|
explicit = os.environ.get("DECNET_VECTORSTORE_PATH")
|
|
if explicit:
|
|
return explicit
|
|
runtime_dir = "/var/lib/decnet"
|
|
if os.path.isdir(runtime_dir) and os.access(runtime_dir, os.W_OK):
|
|
return f"{runtime_dir}/vectors.sqlite"
|
|
return os.path.expanduser("~/.decnet/vectors.sqlite")
|