Commit Graph

867 Commits

Author SHA1 Message Date
7483d01311 refactor(db): extract IdentitiesMixin and CampaignsMixin
Splits the AttackerIdentity and Campaign clustering reads/writes into
sqlmodel_repo/identities.py and sqlmodel_repo/campaigns.py.

Both call _deserialize_attacker (identities only) which resolves
through AttackersMixin via MRO.
2026-04-28 15:07:39 -04:00
912171d053 refactor(db): extract AttackersMixin
Moves the 19 attacker-domain methods (core CRUD, behavior, sessions,
smtp targets, log-derived activity views) plus the _deserialize_attacker
and _deserialize_behavior helpers into sqlmodel_repo/attackers.py.
2026-04-28 15:04:51 -04:00
7ba8bafcaa refactor(db): extract CredentialsMixin
Moves the 12 credential and credential-reuse methods (incl. the
_merge_unique and _enrich_with_secret helpers) into
sqlmodel_repo/credentials.py.
2026-04-28 15:00:04 -04:00
5b1af331b9 refactor(db): extract CanaryMixin
Moves the 13 canary blob/token/trigger methods into
sqlmodel_repo/canary.py.
2026-04-28 14:55:52 -04:00
03b3c8855c refactor(db): extract OrchestratorMixin
Moves the 9 orchestrator event/email log + prune methods into
sqlmodel_repo/orchestrator.py.
2026-04-28 14:54:20 -04:00
555cd13f09 refactor(db): extract RealismMixin
Moves the 8 synthetic-file + realism-config methods into
sqlmodel_repo/realism.py.
2026-04-28 14:52:59 -04:00
9b845269c9 refactor(db): extract LogsMixin
Moves the 8 log methods (incl. get_stats_summary aggregator) into
sqlmodel_repo/logs.py. get_log_histogram remains an abstract dialect
override point; sqlite/mysql subclasses still override it via MRO.
2026-04-28 14:51:35 -04:00
a0aeba5abc refactor(db): extract FleetMixin and promote JSON helpers
Moves the 6 fleet-decky methods (incl. cross-source list_running_deckies
aggregator) into sqlmodel_repo/fleet.py. _serialize_json_fields and
_deserialize_json_fields move to _helpers.py since they're shared
across fleet, topology, and canary.
2026-04-28 14:50:01 -04:00
d989cd0461 refactor(db): extract WebhooksMixin
Moves the 9 webhook-subscription methods (CRUD + delivery
bookkeeping) into sqlmodel_repo/webhooks.py.
2026-04-28 14:47:42 -04:00
167f140b0e refactor(db): extract BountiesMixin
Moves the 5 bounty methods plus the cross-table purge_logs_and_bounties
helper into sqlmodel_repo/bounties.py.
2026-04-28 14:46:39 -04:00
c6804d79b6 refactor(db): extract DeckiesMixin
Moves the 4 decky-shard CRUD methods into sqlmodel_repo/deckies.py.
2026-04-28 14:45:15 -04:00
eebf9e4c97 refactor(db): extract AuthMixin
Moves the 7 user CRUD methods into sqlmodel_repo/auth.py.
_ensure_admin_user stays in __init__.py so DECNET_ADMIN_PASSWORD
remains addressable at the module path tests already monkeypatch.
2026-04-28 14:43:49 -04:00
99adbebe75 refactor(db): extract SwarmMixin
Moves the 7 swarm-host CRUD methods into sqlmodel_repo/swarm.py.
2026-04-28 14:42:58 -04:00
85c914e754 refactor(db): extract AttackerIntelMixin
Moves upsert_attacker_intel, get_attacker_intel_by_uuid,
and get_unenriched_attackers into sqlmodel_repo/attacker_intel.py.
Composed onto SQLModelRepository via mixin inheritance.
2026-04-28 14:40:36 -04:00
e16f47ad24 refactor(db): extract _safe_session/_detach_close to _helpers.py
Module-level session helpers move into sqlmodel_repo/_helpers.py.
__init__.py re-exports them so external import paths
(decnet.web.db.sqlmodel_repo._safe_session) keep resolving.
2026-04-28 14:38:26 -04:00
4167345d51 refactor(db): convert sqlmodel_repo.py to a package
Pure rename — the old monolithic 3505-line file becomes
decnet/web/db/sqlmodel_repo/__init__.py. No code changes.
Subsequent commits will extract per-domain mixins out of __init__.py
to mirror the topical layout used by decnet/web/db/models/.
2026-04-28 14:37:18 -04:00
6d8c90777d chore: remove vulture-flagged dead code, add whitelist
- plain.py: drop `or True` short-circuit + unreachable return; drop now-unused _HASH_HINTS
- ingester.py: drop unused `current_position` param from _flush_batch
- vulture_whitelist.py: document remaining false positives (FastAPI Depends side-effects, IMAP uid_mode where UID==seq)
2026-04-28 14:30:12 -04:00
b994250ef6 dev(ci): added CVE-2026-3219 to ignore vulns; no fix is yet available
Some checks failed
CI / Lint (ruff) (push) Successful in 13s
CI / SAST (bandit) (push) Successful in 21s
CI / Dependency audit (pip-audit) (push) Successful in 34s
CI / Test (Standard) (3.11) (push) Successful in 13m21s
CI / Test (Live) (3.11) (push) Successful in 1m45s
CI / Test (Fuzz) (3.11) (push) Failing after 3h1m24s
CI / Merge dev → testing (push) Has been cancelled
CI / Prepare Merge to Main (push) Has been cancelled
CI / Finalize Merge to Main (push) Has been cancelled
2026-04-28 14:24:57 -04:00
b4adc7246f fixed: deleted line from pyproject
Some checks failed
CI / Lint (ruff) (push) Successful in 17s
CI / SAST (bandit) (push) Successful in 24s
CI / Dependency audit (pip-audit) (push) Failing after 32s
CI / Test (Standard) (3.11) (push) Has been skipped
CI / Test (Live) (3.11) (push) Has been skipped
CI / Test (Fuzz) (3.11) (push) Has been skipped
CI / Merge dev → testing (push) Has been skipped
CI / Prepare Merge to Main (push) Has been skipped
CI / Finalize Merge to Main (push) Has been skipped
2026-04-28 13:03:14 -04:00
674ac7dd13 test(db): cover BaseRepository.update_identity_fingerprints
DummyRepo couldn't instantiate — TLS-cert fingerprint rollup added a new
abstract method without a stub here. Add the override and a call site so
the abstract pass body is hit.
2026-04-28 13:01:37 -04:00
cc6abf7256 fix(tests/stress): eliminate 0-request flakes in locust runs
Three independent issues conspired to make stress tests record 0 requests:

1. Every virtual user did /auth/login in on_start. With 1000 users in a
   spike window, bcrypt-bound logins never finished and on_start failed
   for all users — aggregated requests stayed at 0. Pre-fetch a single
   admin token in the fixture (cached per-host) and pass it via
   DECNET_STRESS_TOKEN so locust users skip the login storm.

2. Locust exits non-zero on any request failure by default, causing
   run_locust to throw away an otherwise valid stats CSV. Pass
   --exit-code-on-error 0 so per-test assertions are the only fail gate.

3. test_stress_sustained ran two locust subprocesses against the same
   uvicorn. Phase 1's keep-alive connections wedged phase 2 into 0
   recorded requests ~2/3 of the time. Refactored stress_server into a
   start_stress_server() context manager and gave each phase its own
   uvicorn.

Stable 3/3 on full suite, 3/3 on test_stress_sustained alone.
2026-04-28 13:01:11 -04:00
681931d9bb docs(roadmap): tick certificate details and three sibling roadmap items 2026-04-28 11:41:17 -04:00
72cc928ebf feat(prober-cert): roll up fingerprints onto AttackerIdentity
Brings the federation-gossip columns on AttackerIdentity to life —
ja3_hashes, hassh_hashes, and the new tls_cert_sha256 — by projecting
the union of every member observation's fingerprints JSON onto the
identity at clusterer create / link / merge time.

- decnet/profiler/identity_rollup.py: pure extract_fp_summaries()
  reads the production bounty shape (payload.fingerprint_type +
  payload.{ja3,hash,cert_sha256}) and returns deduped+sorted JSON
  list[str] per family, or None when a family has no signal so the
  column stays NULL instead of '[]'.
- BaseRepository.update_identity_fingerprints + SQLModel impl: one
  idempotent write that overwrites the three summary columns and
  bumps updated_at.
- ConnectedComponentsClusterer: after every per-component
  reconciliation (fresh-create OR existing-merge+link), recomputes
  and writes the rollup for the target identity. Wrapped in a
  best-effort helper so a write failure logs but never breaks the
  tick.
- Tests: extract_fp_summaries unit (dedup, sort determinism,
  unknown types ignored, malformed JSON, nested-stringified
  payloads, non-string values); end-to-end clusterer ticks
  populate the columns on create + on later observation links;
  no-fingerprint clusters keep the columns NULL.
2026-04-28 11:28:54 -04:00
9ab43b4ea4 feat(prober-cert): UI for active TLS certificate captures
- FpCertificate renders the new cert_sha256 field (truncated, with
  full hash on hover) and a FROM line carrying the prober-side
  target_ip/port so the source is visible.
- tls_certificate payloads split on target_ip presence: prober certs
  land under ACTIVE PROBES, sniffer certs under PASSIVE FINGERPRINTS.
  Two synthetic fpType keys (tls_certificate_active /
  tls_certificate_passive) drive the bucketing without disturbing
  the on-the-wire fingerprint_type.
2026-04-28 11:23:34 -04:00
5f8149daee feat(prober-cert): capture leaf TLS cert after successful JARM
JARM probes are crafted ClientHellos with weird ciphers — they never
complete a real handshake, so the peer cert isn't reachable from
those sockets. After a non-empty JARM hash proves the port speaks
TLS, do a separate ssl.wrap_socket() against the same (ip, port) to
fetch and parse the leaf cert.

- decnet/prober/tlscert.py: fetch + parse via cryptography lib;
  swallows all connect/handshake/parse failures (returns None).
- decnet/prober/worker.py::_capture_tls_cert: emits a tls_certificate
  event with subject_cn / issuer / SANs / validity / SHA-256 +
  publishes on the bus. Wired from _jarm_phase only when JARM
  succeeds, so non-TLS ports never trigger a second connect.
- Tests cover happy path, cert-fetch failure, defense-in-depth crash,
  empty-JARM skip, publish_fn, and parser edge cases (garbage DER,
  empty bytes, missing SAN extension, non-self-signed).
2026-04-28 11:14:44 -04:00
4749c972e5 feat(prober-cert): schema for active TLS cert capture
Adds storage for TLS certificate details collected from attacker-run
servers by the active prober (sibling to the existing JARM probe).

- AttackerIdentity.tls_cert_sha256 / Campaign.tls_cert_sha256:
  JSON list[str] columns mirroring ja3_hashes / hassh_hashes for
  federation gossip.
- ingester clause 9b: emits a 'tls_certificate' fingerprint bounty
  when a prober event carries subject_cn (disjoint from the existing
  sniffer-gated clause).
- Prober-side capture (ssl.wrap_socket follow-up after JARM) and
  profiler rollup land in sibling commits.
2026-04-28 11:09:25 -04:00
e986e81421 fix(test-schemathesis): drop unsupported_method check
The check expects 405 for any HTTP method not declared on a path.
DECNET's topology router has a static `/topologies/services` (GET only)
sibling to a parameterized `/topologies/{topology_id}` (DELETE), so a
DELETE on the static path falls through to the parameterized route and
hits auth, which returns 401 — by design. Leaking 405-vs-401 would let
unauthenticated callers enumerate valid topology UUIDs.

The same shape applies to other static/dynamic sibling pairs across
the API. The check is fundamentally incompatible with that routing
strategy; document the omission inline.
2026-04-28 10:20:43 -04:00
ccc8619387 fix(test-schemathesis): disable rate limiter in fuzz subprocess
Schemathesis fires up to 3000 examples per endpoint. POST /auth/login
caps at 10/5min per IP, so the second example onward returns 429 and
the positive_data_acceptance check flags it as RejectedPositiveData
(its allowed-status list is hardcoded in schemathesis to
2xx/401/403/404/409/5xx, so OpenAPI tweaks can't fix it).

DECNET_LIMITER_ENABLED=false exists for exactly this case (see
limiter.py docstring on stress/load testing).

Reverts the custom_openapi shim from 5d88346 / 9b1168c — the endpoint
already declares 429 in its responses= map (api_login.py:38), and the
shim turned out to address a problem that wasn't there. Drop the
companion test along with it.
2026-04-28 09:51:49 -04:00
9b1168ce0b fix(api): scope 429 OpenAPI injection to rate-limited routes
Previous commit advertised 429 on every operation. Only routes
decorated with @limiter.limit can actually return slowapi's 429 —
currently just POST /api/v1/auth/login. Documenting it elsewhere is
dishonest and would mislead clients into expecting a response the
server cannot produce.

Walk slowapi's _route_limits / _dynamic_route_limits registries to
identify decorated endpoints, match them to FastAPI routes by
{module}.{name}, and only inject 429 on those.

Existing per-route 429 declarations (e.g. SSE connection-cap on
events streams via sse_limits) are untouched.
2026-04-28 01:00:34 -04:00
5d883466a2 fix(api): advertise 429 on every operation in OpenAPI
SlowAPI middleware can short-circuit any request with 429 once a
per-route or per-IP rate limit fires (e.g. POST /api/v1/auth/login is
capped at 10/5min). The OpenAPI spec did not declare 429 on any
operation, so schemathesis flagged legitimate rate-limit responses as
RejectedPositiveData / status-code-nonconformance failures.

Override app.openapi to inject a generic 429 response object on every
HTTP operation in the generated schema. Add a contract test that fails
if any operation drops the 429 advertisement.
2026-04-28 00:58:37 -04:00
6b407e8c9c fix(tests): align stale tests with current behavior
- swarm/test_swarm_api, swarm/test_heartbeat: replace deprecated
  asyncio.get_event_loop().run_until_complete() with asyncio.run();
  the former raises in 3.11 once another test has set+closed a loop on
  the main thread.
- prober/test_prober_bus, prober/test_prober_worker: extend tcp_fingerprint
  mocks with tos/dscp/ecn/server_isn so the worker doesn't KeyError into
  the prober_error branch.
- services/test_service_isolation: collector now retries on event-stream
  errors instead of exiting; assert it stays running and cancel cleanly.
- live/test_imap_live, live/test_pop3_live: log format emits
  outcome="failure", not "failed".
- live/test_service_isolation_live: is_service_container accepts label
  OR state-name; rewrite the empty-state test against a synthetic
  unlabeled container instead of the host's real fleet.
2026-04-28 00:44:40 -04:00
8344b539c8 fix(ssh-template): drop sshd/pam_unix native chatter at rsyslog
OpenSSH's native syslog ("Failed password", "Connection from",
"Connection closed by …") and the pam_unix lines emitted from sshd's
PAM stack add no signal beyond what auth-helper already captures as
structured login_attempt events. They cluttered the dashboard and
arrived without an SD wrapper, forcing prose-IP heuristics in the
collector.

Add a `:programname, isequal, "sshd" stop` rule above the forwarding
actions in /etc/rsyslog.d/50-journal-forward.conf. pam_unix lines from
sshd inherit programname=sshd so the same rule covers both. sudo /
login / su pam_unix lines keep flowing (different programname), so
post-login privilege escalation telemetry is preserved.
2026-04-27 23:26:53 -04:00
9350ce195a fix(collector,correlation): extract attacker IP from sshd/pam free-form prose
Native sshd and pam_unix lines route through rsyslog without the
relay@55555 SD wrapper and without key=value pairs, so attacker_ip
fell through to "Unknown". Add a prose-IP fallback to both parsers:
anchored patterns (from/rhost/client/src) win first so we never pick
the local listener in "Connection from X port Y on Z port 22", with
a bare-IPv4 scan as the last resort.
2026-04-27 23:16:42 -04:00
3c571cce5a fix(correlation): prober events no longer count as attacker traversal
The prober writes events with hostname=decnet-prober and target_ip=
<the attacker being fingerprinted>. The parser pulls target_ip into
attacker_ip (it's one of _IP_FIELDS), which is correct for indexing
fingerprints under the attacker — but it had a side effect: every
fingerprinted attacker had two distinct deckies on file (the real
decoy they touched + decnet-prober) and the correlation engine's
traversals() classified that as lateral movement. Live dashboard
showed bogus "dmz-gateway -> decnet-prober" paths and TRAVERSAL
badges on attackers who'd done nothing but knock on the front door.

The prober is internal infrastructure, not a hop. Filter the
"decnet-" namespace out of distinct-decky counts and hop paths in
the engine. Fingerprints stay attached to the attacker profile via
the existing per-IP event index — just no longer as traversal.
2026-04-27 23:02:23 -04:00
e03a6d10a0 fix(collector): retry on event-stream errors and add periodic reconciler
Hit live on first VPS deploy: a window between the initial
client.containers.list() snapshot and the client.events() start-event
stream let topology service containers slip through, requiring an
operator restart for them to be picked up.

Two fixes:

* `_watch_events` now wraps the events() call in a retry loop with
  exponential backoff (1s -> 30s cap). A docker.errors.APIError, daemon
  reload, or SDK stream-decode hiccup used to make the executor task
  return cleanly, leaving the collector "running" with no event
  subscription. Future container starts were silently dropped until
  the unit was restarted.

* New `_reconcile_loop` async task ticks every
  DECNET_COLLECTOR_RECONCILE_S (default 30s), re-scans
  client.containers.list(), and calls _spawn for any service container
  not already in `active`. Belt to the event watcher's suspenders:
  even if a start event is dropped during a reconnect window, the
  reconciler picks it up within one cycle. Also prunes finished
  futures from `active` so the dict's bounded by current container
  count rather than agent lifetime churn.
2026-04-27 22:56:13 -04:00
c5db1d7ba2 fix(config-ini): strip inline # and ; comments from values
The module docstring teaches inline comments — `mode = master    # or
"agent"` is the canonical example for the [decnet] section. Python's
configparser ignores those by default unless inline_comment_prefixes
is set explicitly, so the comment became part of the value and
downstream validators rejected it ("mode must be 'agent' or 'master',
got 'master                       # or \"agent\"'").

Hit live on first VPS deploy: every CLI invocation crashed at import
time with a stack trace that didn't make it obvious the docstring's
example was the trigger. Now the parser does what the docs promise.
2026-04-27 22:55:58 -04:00
0b1a17b4eb fix(agent): pass --always-recreate-deps so service netns shares stay fresh
Decky service containers join their base via `network_mode:
container:<base>` and Docker binds that share at service start time. If
`docker compose up` recreates a base (e.g. ports: changes after a
forwards_l3 toggle) but decides services are unchanged, services keep
a stale FD into the destroyed namespace and end up with only `lo` — so
external traffic hits a closed port on the live base and gets RST.

Hit live on the first VPS deploy: external SSH to the dmz-gateway was
refused while sshd was listening, because base and service netns
inodes had drifted apart. `--always-recreate-deps` makes compose
rebuild every dependent whenever its base is recreated, removing the
race entirely.
2026-04-27 22:55:48 -04:00
0a525ebd37 fix(web): proxy follows DECNET_API_HOST instead of hardcoding 127.0.0.1
The dashboard's /api/* proxy hardcoded 127.0.0.1 as the target host.
That works when the API binds to a wildcard or to loopback, but
breaks the moment an operator binds the API to a specific address —
e.g. a Tailscale IP for tailnet-only deploys: the API stops listening
on loopback entirely and the proxy gets ECONNREFUSED on every request.

The web command now reads DECNET_API_HOST and falls back to loopback
only when the API is on a wildcard (0.0.0.0 / :: / unset). A new
--api-host flag overrides at the CLI level.
2026-04-27 22:55:25 -04:00
673bc5b819 ops(init): ship logrotate config so /var/log/decnet can't fill the disk
Without rotation, the syslog listener and per-host collector grow
/var/log/decnet/ without bound — a noisy attacker (or an active
probe storm) fills the disk in hours on a small VPS. New
deploy/logrotate.d/decnet caps at 7 daily rotations or 100 MiB,
whichever comes first, and uses copytruncate because the ingester
and forwarder hold the files open via Python and won't reopen on
a rename rotation.

Wire install / remove into `decnet init` and `decnet init --deinit`
alongside the existing tmpfiles.d / polkit handling.
2026-04-27 21:26:13 -04:00
5415e98458 sec(api): mode-gate and eager-load JWT secret in lifespan
Refuse to start decnet.web.api when DECNET_MODE=agent (unless the
operator explicitly opts into dual-role with DECNET_DISALLOW_MASTER=
false). The Typer CLI already hides master-only commands on agents,
but a misconfigured systemd unit or a direct uvicorn invocation
would bypass that — now the lifespan itself refuses, before any
worker, DB or bus comes up.

Resolve DECNET_JWT_SECRET eagerly at startup so a missing or known-
bad value fails at boot rather than on the first auth-gated request.
The lazy-load shape stays useful for non-master CLIs.
2026-04-27 21:26:03 -04:00
1a7da33375 sec(env): refuse to start master API with footgun public-binding config
Add validate_public_binding() called from the master API lifespan: when
DECNET_API_HOST is non-loopback, refuse to start if DECNET_CORS_ORIGINS
still contains a loopback origin (catches the "operator flipped to
0.0.0.0 to make it work and forgot to update CORS" footgun) or if
DECNET_CANARY_HTTP_BASE is plaintext http:// to a non-loopback host.
Log CRITICAL when DECNET_LIMITER_ENABLED=false on a public binding.
The validator no-ops under pytest so unrelated suites don't trip on it.

Add DECNET_VERIFY_HOSTNAME env knob; AgentClient and UpdaterClient
consult it when verify_hostname is None, giving production deploys
TLS hostname verification on top of the existing CA + fingerprint pin.
Default off so dev enrollments with mismatched SANs keep working.
2026-04-27 21:15:15 -04:00
28e2a93355 sec(updater): harden tarball extraction and verify sha256 before extract
Reject symlinks, hardlinks, device nodes and FIFOs in update tarballs;
validate each member's resolved path stays under dest after symlink
resolution; cap uncompressed size at 256 MiB to bound gzip-bomb damage;
strip setuid/setgid bits from extracted modes.

Add an optional sha256 form field to /update and /update-self; the
master client computes and sends it on every push, the executor
refuses to extract on mismatch. mTLS already authenticates the
master, so this is defence-in-depth against in-transit corruption
and gives operators a way to pin "exactly these bytes" for vetted
releases.
2026-04-27 21:14:48 -04:00
1de4136ed9 style(realism-ui): adopt the persona-page design language
Both pages now layer on DeckyFleet.css + PersonaGeneration.css and use
the project's house vocabulary — fleet-root shell, page-header with
title-group + actions, btn / btn.violet / btn.ghost, info-banner with
the violet left rule, and the dim/matrix/alert text accents.

RealismConfig: inputs are flush-styled weight-input fields with a
violet focus ring; section heads carry a TOTAL badge; canary rows get
the project's amber accent; canary probability lives in a panel-bordered
slider row.

SyntheticFiles: the inline-styled table is now a styled .files-table
with the standard hover affordance, the filter-row uses tweak-group
label+select pairs, the drawer carries .drawer-eyebrow / .drawer-title
/ .meta-grid in the same style as the canary token drawer, and pager
buttons share the .btn.ghost.small treatment.

No behavioural change.
2026-04-27 18:08:58 -04:00
2950fc216e feat(realism-ui): human-readable content_class labels
Single source of truth in decnet_web/src/realism/labels.ts: maps each
ContentClass enum value to a friendly display name ("Note",
"Cron Log", "Canary · AWS Credentials", …). Used by RealismConfig
(weight tables + class filter dropdown) and SyntheticFiles (table row
+ drawer detail).

Canary classes get a subtle amber accent so the dashboard's read of
"this row is callback-bearing" doesn't depend on prefix-spotting in
mono text. Raw enum value still appears in dim mono next to the label
so an operator copy/pasting from logs or grepping the codebase still
finds it.

No backend change: the wire shape is still the snake_case enum; the
beautification is render-time only.
2026-04-27 18:04:33 -04:00
56a88d7bd4 feat(realism-ui): operator panel for planner weights + canary probability
New /realism-config page sits next to Persona Generation and
Synthetic Files under the Automation nav. Editable weight tables for
user / system / canary content classes (with live percent share),
plus a slider for canary_probability.

Wires GET/PUT /api/v1/realism/config — viewer can read; admin
required to save. Validation errors from the API are surfaced inline
rather than swallowed; the SAVE button refreshes from the server's
canonical snapshot so the operator sees exactly what landed (matters
because cross-list entries are silently dropped server-side).
2026-04-27 18:01:35 -04:00
2cc60bd677 feat(realism): operator-tunable planner weights via realism_config
New realism_config table (uuid PK + unique key) + two repo methods
(get/set) backs an admin-only GET/PUT /api/v1/realism/config surface.

The planner now exposes apply_payload(payload) / current_payload() /
reset_to_defaults() and reads its weights through mutable module
globals; pick() resolves the live values each call. Validation
catches negative weights, zero totals, out-of-range canary_probability,
unknown content_class names, and silently drops cross-list entries
(canary class on the user list, etc).

The orchestrator worker calls _refresh_realism_config(repo) on
startup and every 5 ticks (~5min at 60s interval). Operator changes
land within one refresh window with no bus signal — the simpler path
for a knob whose latency tolerance is minutes.
2026-04-27 18:00:08 -04:00
da3c35c6a4 fix(realism): synthetic_files path fits MySQL utf8mb4 index cap
The (decky_uuid VARCHAR(64), path VARCHAR(1024)) UNIQUE constraint
generated a 4352-byte composite key under utf8mb4 (4 bytes/char),
busting MySQL's 3072-byte cap and crashing decnet api on init with:

    Specified key was too long; max key length is 3072 bytes

Tighten path to VARCHAR(512) — (64+512)*4 = 2304 bytes, well under
the cap. Real realism + canary placement paths are short
(/home/<persona>/Documents/<file>, ~70 chars); 512 keeps headroom
without the index hassle. Pre-v1, no migration helper.

Adds a regression test pinning the (decky_uuid + path) byte budget so
a future widening fails loudly in CI rather than at MySQL deploy
time.
2026-04-27 17:55:35 -04:00
397a1a111e feat(realism): LLM/breaker status on orchestrator heartbeat
Surfaces realism subsystem state on the existing worker heartbeat
extra hook (system.orchestrator.health) — no new bus topic. Payload
carries {llm_enabled, llm_backend, llm_model, llm_breaker_state}, so
the dashboard's worker panel renders a live LLM badge with a colored
breaker-state dot:

  closed (green)   — LLM healthy
  half_open (amber) — cooldown elapsed; next call is a probe
  open (red)       — short-circuiting to deterministic templates

Heartbeat is the canonical worker self-report channel; piggybacking on
extra(...) avoids a new topic family while keeping the snapshot
recomputed each beat (30s).
2026-04-27 17:51:00 -04:00
55e86f606c feat(realism-ui): synthetic files browser
New /synthetic-files page sits next to Persona Generation and Canary
Tokens under the Automation nav group. Operators get a paginated
inventory of files realism has grown across the fleet (decky, path,
persona, content_class, last_modified, edit_count, hash) with filters
on decky / persona / content_class.

Decky filter is a dropdown sourced from /deckies — never free text.
Row click opens a drawer with the body preview; the drawer surfaces a
TRUNCATED chip when the stored body is at the 64KB cap.
2026-04-27 17:48:05 -04:00
87cb61c8b2 feat(realism): synthetic-files browser API
Adds GET /api/v1/realism/synthetic-files (paginated list, filters by
decky_uuid, persona, content_class) and
GET /api/v1/realism/synthetic-files/{uuid} (single row with last_body
and a truncated:bool flag set when the stored body is at the 64KB cap).

Repo gains count_synthetic_files() and get_synthetic_file(uuid). The
list view drops last_body to keep the wire payload bounded; the detail
endpoint is the only path that returns it. Read-only — orchestrator
remains the sole writer.
2026-04-27 17:44:53 -04:00