realism migration: wiki sync

Document the realism content engine, the orchestrator service collapse,
and every public surface change from the migration on dev.

Page-level changes:

- Realism.md (new) — operator walkthrough of the realism library:
  ContentClass taxonomy, persona pools (topology vs global), diurnal
  gating, edit-in-place, LLM enrichment with circuit breaker, and 3%
  canary cultivation. Configuration table and CLI surface.
- Module-Reference-Core.md — new "decnet/realism/" section covering
  taxonomy / planner / naming / bodies / personas / LLM backend /
  prompts. Notes the env-var rename.
- Module-Reference-Workers.md — new "Orchestrator" section covering
  the unified worker, action-kind weights, drivers (ActivityDriver
  ABC, plant_file/read_file contract, _run_edit), and the email
  delivery surface that stayed put (events / threads / scheduler).
- Service-Bus.md — fix orchestrator topic table: emailgen producer
  attribution is gone (orchestrator owns email now), system.emailgen.*
  topics removed.
- CLI-Reference.md — new "decnet orchestrate" and "decnet realism
  import-personas" sections plus DECNET_REALISM_* / DECNET_CANARY_*
  rows in the env summary.
- Environment-Variables.md — new "Realism content engine" and
  "Canary worker" sections; starter .env.local entries appended.
- Systemd-Setup.md — bundled decnet-orchestrator.service template
  with the realism env block; explicit note that decnet-emailgen
  is gone post-migration.
- Design-Overview.md — Orchestrator + Canary rows added to the
  microservice table; Realism cross-link in the Related Pages list.
- Roadmap-and-Known-Debt.md — moved orchestrator-fake-files and
  emailgen-twin-worker entries to "Recently closed" with a pointer
  to the migration history.
- Home.md, _Sidebar.md — Realism added to the user-docs nav.
2026-04-27 17:23:41 -04:00
parent 8bdda0d9e6
commit 2a0295891c
11 changed files with 463 additions and 2 deletions

@@ -390,6 +390,79 @@ sudo decnet sniffer --daemon
--- ---
## decnet orchestrate
`decnet/cli/orchestrator.py:10`
Run the orchestrator worker — the long-lived loop that injects
synthetic life into the running fleet (inter-decky SSH traffic, file
plants and edits, fake corporate email drops). After the realism
migration this single command covers what `decnet orchestrate` and
`decnet emailgen run` did separately; both `decnet-emailgen.service`
and the standalone CLI are gone.
**Usage:** `decnet orchestrate [flags]`
**Flags:**
| Flag | Type | Default | Description |
|---|---|---|---|
| `--interval`, `-i` | int (seconds) | `60` | Time between ticks. Each tick rolls one action across the weighted set `traffic` (45%) / `file` (45%) / `email` (10%). |
| `--daemon`, `-d` | bool | `False` | Detach to background. Skip under systemd; the unit supervises directly. |
| `--llm` / `--no-llm` | bool | env-driven | Enable or disable LLM enrichment of user-class file bodies. Default reads `$DECNET_REALISM_LLM` (any non-empty / non-`off` value enables). When the LLM is unreachable, a process-local circuit breaker trips after 3 consecutive failures and the worker falls back to deterministic templates for 60 s. |
**Examples:**
```bash
decnet orchestrate # 60s tick, env decides LLM
decnet orchestrate --interval 30 # double the rate
decnet orchestrate --no-llm # force template-only, ignore env
DECNET_REALISM_LLM=ollama decnet orchestrate
```
See [Realism](Realism) for content classes, the persona pool, and
how canary cultivation hooks into the same planner.
---
## decnet realism
`decnet/cli/realism.py:25`
Maintenance commands for the realism content engine. The only
sub-command currently is `import-personas`. There is no `decnet
realism run` — the long-lived worker is `decnet orchestrate`.
### decnet realism import-personas
Validate and install a JSON file as the host-wide global persona
pool. Used for fleet (MACVLAN/IPVLAN) and SWARM-shard deckies that
have no parent topology row. MazeNET-topology deckies use
`Topology.email_personas` instead.
**Usage:** `decnet realism import-personas <PATH> [--output PATH]`
**Flags:**
| Flag | Type | Default | Description |
|---|---|---|---|
| `--output`, `-o` | path | resolved global pool | Override the destination. Defaults to `$DECNET_REALISM_PERSONAS`, then `/etc/decnet/email_personas.json`, then `~/.decnet/email_personas.json`. |
**Examples:**
```bash
decnet realism import-personas ./personas.json
decnet realism import-personas ./personas.json -o ~/.decnet/email_personas.json
```
The validator parses every entry into the `EmailPersona` schema
(`decnet/realism/personas.py`), drops invalid entries with a
warning, refuses to write when no entries are valid, and warns when
fewer than two entries land (the email path needs at least two for
sender/recipient pairs). Master-only — gated by `DECNET_MODE=master`.
---
## decnet db-reset ## decnet db-reset
`decnet/cli.py:930` `decnet/cli.py:930`
@@ -430,5 +503,11 @@ DECNET_DB_URL=mysql+asyncmy://... decnet db-reset --i-know-what-im-doing
| `DECNET_DB_TYPE` | `db-reset` | `mysql` or `sqlite`. | | `DECNET_DB_TYPE` | `db-reset` | `mysql` or `sqlite`. |
| `DECNET_DB_URL` | `db-reset` | Full async DSN. | | `DECNET_DB_URL` | `db-reset` | Full async DSN. |
| `DECNET_DB_HOST/PORT/NAME/USER/PASSWORD` | `db-reset` | Fallback DSN components. | | `DECNET_DB_HOST/PORT/NAME/USER/PASSWORD` | `db-reset` | Fallback DSN components. |
| `DECNET_REALISM_LLM` | `orchestrate` | LLM backend selector (`ollama` / `fake` / `off`). Default: empty (LLM disabled, templates only). |
| `DECNET_REALISM_MODEL` | `orchestrate` | Model name for the Ollama backend. Default: `llama3.1`. |
| `DECNET_REALISM_TIMEOUT` | `orchestrate` | Per-call wall-clock cap. Default: `60`. |
| `DECNET_REALISM_PERSONAS` | `orchestrate`, `realism import-personas` | Global persona pool path override. |
| `DECNET_CANARY_HTTP_BASE` | `orchestrate` (canary cultivator), `canary` worker | HTTP callback base URL for cultivated canary artifacts. |
| `DECNET_CANARY_DNS_ZONE` | `orchestrate` (canary cultivator), `canary` worker | DNS zone for callback subdomains. |
See `env.config.example` at the repo root for full defaults. See `env.config.example` at the repo root for full defaults.

@@ -15,6 +15,8 @@ DECNET runs as a small constellation of workers around a FastAPI process. Each w
| Sniffer | `decnet sniffer --daemon` | `DECNET_EMBED_SNIFFER=1` | Passive PCAP on the decoy bridge | | Sniffer | `decnet sniffer --daemon` | `DECNET_EMBED_SNIFFER=1` | Passive PCAP on the decoy bridge |
| Prober | `decnet probe --daemon` | always runs | Active realism checks | | Prober | `decnet probe --daemon` | always runs | Active realism checks |
| Mutator | `decnet mutate --daemon --watch` | always runs | Runtime fleet mutation | | Mutator | `decnet mutate --daemon --watch` | always runs | Runtime fleet mutation |
| Orchestrator | `decnet orchestrate` | systemd unit | Synthetic life — inter-decky SSH traffic, persona-driven file plants and edits, fake corporate email drops, occasional canary cultivation. Driven by [[Realism]]. Pre-realism this was three workers (`orchestrator` + `emailgen` + an unimplemented file driver); the migration collapsed them into one. |
| Canary | `decnet canary` | systemd unit | DNS + HTTP listeners that catch attacker callbacks from planted canary tokens. |
Every worker is also how `decnet deploy` spawns them — the deploy path shells out to `python -m decnet.cli <worker> --daemon` so there is exactly one code path, whether you run interactively or under systemd. Every worker is also how `decnet deploy` spawns them — the deploy path shells out to `python -m decnet.cli <worker> --daemon` so there is exactly one code path, whether you run interactively or under systemd.
@@ -82,3 +84,4 @@ If you are chasing a bug across subsystem boundaries, start from those.
- [[Database-Drivers]] — SQLite vs MySQL. - [[Database-Drivers]] — SQLite vs MySQL.
- [[Environment-Variables]] — the full env surface. - [[Environment-Variables]] — the full env surface.
- [[Systemd-Setup]] — running each worker as a supervised unit. - [[Systemd-Setup]] — running each worker as a supervised unit.
- [[Realism]] — the content engine driving the orchestrator's file + email branches.

@@ -130,6 +130,39 @@ Example override:
DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com
``` ```
## Realism content engine
The orchestrator's per-tick file plants and email drops are driven by
[`decnet/realism/`](Realism). LLM enrichment for user-class file
bodies (notes, TODOs, drafts, scripts) is opt-in via these env vars;
when unset / empty / `off`, the orchestrator falls back to
deterministic templates and never calls an LLM.
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_REALISM_LLM` | string (`ollama` / `fake` / `off` / unset) | unset | No | Picks the LLM backend in `decnet/realism/llm/factory.py`. Empty / `off` / `none` / `0` / `false` / `disabled` disables enrichment; any other value enables. The orchestrator process-local circuit breaker trips after 3 consecutive failures and falls back to templates for 60 s. |
| `DECNET_REALISM_MODEL` | string | `llama3.1` | No | Model name passed to `ollama run`. Override per-host in the orchestrator unit's `EnvironmentFile`. |
| `DECNET_REALISM_TIMEOUT` | float seconds | `60` | No | Per-call wall-clock cap on `ollama run`. Hitting this raises `LLMTimeout`, which counts as one failure for the circuit breaker. |
| `DECNET_REALISM_PERSONAS` | path | `/etc/decnet/email_personas.json` | No | Override path for the host-wide persona pool consumed by fleet (MACVLAN/IPVLAN) and SWARM-shard deckies. MazeNET-topology deckies use `Topology.email_personas` instead. Path is also resolved by `decnet realism import-personas`. |
| `DECNET_REALISM_FAKE_OUTPUT` | string | (canned email body) | No | Override the canned text the `fake` backend returns. Only useful in integration tests. |
The renamed predecessors `DECNET_EMAILGEN_LLM` / `_MODEL` / `_TIMEOUT`
/ `_PERSONAS` / `_FAKE_OUTPUT` are no longer read; pre-v1 clean break.
## Canary worker
The canary worker (`decnet canary`) and the realism canary cultivator
both read these to know how to embed callbacks in planted artifacts.
When both are unset, the cultivator can still produce passive bait
(e.g. `aws_creds`) but raises for generators that require a callback
host (`ssh_key` DNS, `mysql_dump`).
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_CANARY_HTTP_BASE` | URL (no trailing slash) | unset | No | Public-facing base for HTTP callback URLs (e.g. `https://canary.example.test`). Generators embed `<base>/c/<callback_token>` into artifacts. |
| `DECNET_CANARY_DNS_ZONE` | DNS zone | unset | No | DNS zone for callback subdomains (e.g. `canary.example.test`). Generators embed `<callback_token>.<zone>` for DNS-trip artifacts (`ssh_key` comments, `mysql_dump` replica `SOURCE_HOST`). |
| `DECNET_CANARY_BLOB_DIR` | path | `/var/lib/decnet/canary/blobs` | No | On-disk store for operator-uploaded canary blobs, deduplicated by sha256. |
## Starter `.env.local` ## Starter `.env.local`
Copy this to the project root as `.env.local`, change every placeholder, and Copy this to the project root as `.env.local`, change every placeholder, and
@@ -181,6 +214,16 @@ DECNET_DB_TYPE=sqlite
# CORS (only needed when the browser is not on the same host:port as the API) # CORS (only needed when the browser is not on the same host:port as the API)
# DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com # DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com
# Realism content engine (LLM enrichment is opt-in; templates are the fallback)
# DECNET_REALISM_LLM=ollama
# DECNET_REALISM_MODEL=llama3.1
# DECNET_REALISM_TIMEOUT=60
DECNET_REALISM_PERSONAS=/etc/decnet/email_personas.json
# Canary worker (set both to enable callback-bearing artifact generators)
# DECNET_CANARY_HTTP_BASE=https://canary.example.test
# DECNET_CANARY_DNS_ZONE=canary.example.test
``` ```
## Notes ## Notes

@@ -33,6 +33,7 @@ Supported Python: 3.11, 3.12, 3.13. Python 3.14 is **not** supported — its new
- [Database-Drivers](Database-Drivers) - [Database-Drivers](Database-Drivers)
- [Systemd-Setup](Systemd-Setup) - [Systemd-Setup](Systemd-Setup)
- [Logging-and-Syslog](Logging-and-Syslog) - [Logging-and-Syslog](Logging-and-Syslog)
- [Realism](Realism)
- [Web-Dashboard](Web-Dashboard) - [Web-Dashboard](Web-Dashboard)
- [REST-API-Reference](REST-API-Reference) - [REST-API-Reference](REST-API-Reference)
- [Mutation-and-Randomization](Mutation-and-Randomization) - [Mutation-and-Randomization](Mutation-and-Randomization)

@@ -202,6 +202,29 @@ Shared builder functions for constructing a list of `DeckyConfig` from either CL
--- ---
## `decnet/realism/`
Shared content engine driving the orchestrator's per-tick file plants and email drops. Importable library, no worker / systemd unit / CLI of its own — the long-lived loop that calls into it is `decnet/orchestrator/worker.py`. See [[Realism]] for the operator-facing walkthrough.
- `decnet/realism/taxonomy.py::ContentClass` — StrEnum of every artifact class the planner can pick: user (`note` / `todo` / `draft` / `script`), system (`log_cron` / `log_daemon` / `cache_tmp`), `email`, and 8 `canary_*` variants. Wire-visible — values are persisted on `synthetic_files.content_class` and used as bus-event discriminants.
- `decnet/realism/taxonomy.py::Plan` — frozen dataclass `(decky_uuid, decky_name, persona, content_class, action, target_path, mtime, body_hint, previous_body, notes)`. Construction validates that `action="edit"` carries a `previous_body`.
- `decnet/realism/diurnal.py::in_work_hours(window, now)` — wrap-around-aware `"HH:MM-HH:MM"` membership; fail-open on parse error.
- `decnet/realism/diurnal.py::sample_mtime(window, now, *, backdate_min_hours=0.5, backdate_max_days=14.0)` — backdated `datetime` whose hour-of-day falls in *window*. Drivers pass to `touch -d`.
- `decnet/realism/planner.py::pick(deckies, now, *, edit_candidate=None, rand=None)` — sync, pure-of-DB. Decides create / edit / leave-alone (10 %), weights user / system content (70 / 30 split), and at the per-pair level rolls a 3 % canary gate. Returns `Plan` or `None`.
- `decnet/realism/naming.py::make_path(content_class, persona, *, rand=None)` — per-class filename templates. Anti-regression: no namer is allowed to drop a raw decimal timestamp into a filename.
- `decnet/realism/bodies.py::make_body(content_class, persona, *, rand=None)` — deterministic body templates per class. The async sibling `make_body_with_llm(...)` calls an `LLMBackend` for user-classes (with em-dash stripping); falls back to the template on timeout / error / empty.
- `decnet/realism/bodies.py::next_iteration(content_class, persona, previous_body, *, rand=None)` — read-modify-write mutator for `EditAction`. Append-only for logs; flip-or-append for TODOs; append-line for notes / drafts / scripts.
- `decnet/realism/personas.py::EmailPersona` — pydantic model used by both file and email content. Fields: `name`, `email`, `role`, `tone`, `mannerisms`, `language`, `signature`, `active_hours`, `reply_latency`, `uses_llms_heavily`. The class name retains "Email" because every persona today has a mandatory email field, but it owns *all* persona-driven realism.
- `decnet/realism/personas_pool.py::load(*, language_default="en")` / `resolve_path()` / `reset_cache()` — JSON pool resolved from `$DECNET_REALISM_PERSONAS``/etc/decnet/email_personas.json``~/.decnet/email_personas.json`. Mtime-cached.
- `decnet/realism/llm/base.py::LLMBackend` — minimal async one-shot Protocol. `decnet/realism/llm/factory.py::get_llm` dispatches on `DECNET_REALISM_LLM` (`ollama` / `fake`). `decnet/realism/llm/circuit.py::LLMCircuitBreaker` — process-local breaker, default 3 failures → open, 60 s cooldown.
- `decnet/realism/prompts/email.py::build` — corporate-email prompt builder (lifted from the original `orchestrator.emailgen.prompt`).
- `decnet/realism/prompts/filebody.py::build` — class-conditioned prompts for user-class file bodies.
- `decnet/realism/prompts/_style.py::em_dash_rule(persona)` / `strip_em_dashes(text, persona)` — shared stylometric guard. Persona `uses_llms_heavily=True` opts out.
The persona schema, prompt builder, LLM glue, and global persona pool moved here from `decnet/orchestrator/emailgen/` in stage 2 of the realism migration. Every importer was updated; the env-var rename `DECNET_EMAILGEN_*``DECNET_REALISM_*` is a clean break — the predecessors are no longer read.
---
## `decnet/telemetry.py` ## `decnet/telemetry.py`
OpenTelemetry integration that is strictly opt-in via `DECNET_DEVELOPER_TRACING=true`. When disabled, every public export is a zero-cost no-op: `@traced` returns the unwrapped function, `get_tracer()` returns a `_NoOpTracer`, the repository wrapper returns the original repo, and injection/extraction of trace context is a nop. OpenTelemetry integration that is strictly opt-in via `DECNET_DEVELOPER_TRACING=true`. When disabled, every public export is a zero-cost no-op: `@traced` returns the unwrapped function, `get_tracer()` returns a `_NoOpTracer`, the repository wrapper returns the original repo, and injection/extraction of trace context is a nop.

@@ -435,3 +435,63 @@ Reads: nothing persistent before the first update; afterwards, the release direc
### `decnet/updater/routes/` ### `decnet/updater/routes/`
Reserved for handler splits once the app grows. All routes currently live in `app.py`. Reserved for handler splits once the app grows. All routes currently live in `app.py`.
---
## Orchestrator — `decnet/orchestrator/`
Single async worker that injects synthetic life into the running fleet. After the realism migration (stages 17) it owns three action shapes:
- **traffic** — `TrafficAction`. Inter-decky SSH banner probe via `docker exec` against the source decky's ssh container.
- **file** — `FileAction` / `EditAction`. Plant or edit-in-place a file on a destination decky. Driven by [`decnet.realism.planner`](Realism); covers inert content (notes, TODOs, drafts, scripts, cron / daemon logs, /tmp caches) and rare callback-bearing canary cultivation.
- **email** — `EmailAction`. Persona-driven RFC 2822 email dropped into a mail decky's IMAP/POP3 spool.
`decnet emailgen run` and `decnet-emailgen.service` are gone — folded into this worker by stage 5 of the realism migration.
### `decnet/orchestrator/__init__.py`
Re-exports `orchestrator_worker`.
### `decnet/orchestrator/worker.py`
- `decnet/orchestrator/worker.py::orchestrator_worker` — async entry point. Default `interval=60`. Constructs an `LLMBackend` via `decnet.realism.llm.get_llm()` when `DECNET_REALISM_LLM` is enabled, plus an `LLMCircuitBreaker` (3 failures → open, 60 s cooldown). Honours `system.orchestrator.control` for graceful shutdown.
- `decnet/orchestrator/worker.py::_one_tick` — one action pick + one driver invocation + one DB write + one fire-and-forget bus publish.
- `decnet/orchestrator/worker.py::_pick_action` — rolls the action kind (`_ACTION_WEIGHTS = (("traffic", 45), ("file", 45), ("email", 10))`) and delegates to `scheduler.pick`, `scheduler.pick_file`, or `email_scheduler.pick`. Quiet branches fall through to the other two so a (decky-set, persona-pool, mail-decky) shape that silences one branch doesn't waste the tick.
- `decnet/orchestrator/worker.py::_periodic_prune` — every `_PRUNE_EVERY_TICKS=100` ticks, calls `repo.prune_orchestrator_events(per_dst_cap=10000)` and `repo.prune_orchestrator_emails(per_decky_cap=5000)`.
- `decnet/orchestrator/worker.py::_record_synthetic_file` — after a successful FileAction plant, writes a row to `synthetic_files` keyed `(decky_uuid, path)`. On the unique-constraint collision (same path re-planted), patches the existing row's `last_modified` / `content_hash` / `last_body` and bumps `edit_count`.
- `decnet/orchestrator/worker.py::_bump_synthetic_file_after_edit` — after a successful EditAction, bumps `edit_count + 1`, refreshes `content_hash`, stores the new body. No-op when the candidate row was pruned mid-flight.
- `decnet/orchestrator/worker.py::_llm_should_enable` — resolves the LLM-enabled flag from the CLI flag, env var, defaults.
### `decnet/orchestrator/scheduler.py`
- `TrafficAction`, `FileAction`, `EditAction` dataclasses. `FileAction.content_bytes` carries binary canary artifact bytes (DOCX/PDF) so the SSH driver doesn't utf-8 round-trip them.
- `decnet/orchestrator/scheduler.py::pick` — sync, traffic-only. Returns a `TrafficAction` for ≥ 2 SSH-capable deckies or `None`.
- `decnet/orchestrator/scheduler.py::pick_file` — async. Resolves personas per decky (topology pool when source is `topology`; global `realism.personas_pool` otherwise), pre-fetches an edit candidate from `synthetic_files` ~50 % of ticks, asks `realism.planner.pick` to choose between create / edit / leave-alone, maps the resulting `Plan` to a `FileAction` (create) or `EditAction` (edit). When the picked content_class is canary, dispatches through `decnet.canary.cultivator.cultivate` and packs the bytes into `FileAction.content_bytes`.
- `decnet/orchestrator/scheduler.py::_resolve_personas` — attaches `_realism_personas` to each decky dict before passing to the planner. Topology-source deckies pull from `Topology.email_personas`; fleet/shard from the global pool.
### `decnet/orchestrator/events.py`
- `to_row(action, result)` — builds the `OrchestratorEvent(**...)` kwargs. `kind` is `"traffic"` for `TrafficAction`, `"file"` for `FileAction` and `EditAction`. `EmailAction` rows go to `OrchestratorEmail` via `decnet.orchestrator.emailgen.events.to_row`.
- `topic_for(action)``orchestrator.{traffic|file}.{dst_uuid}`.
- `event_type_for(action)` — discriminator string for SSE replay.
### `decnet/orchestrator/drivers/`
- `decnet/orchestrator/drivers/base.py::ActivityDriver` — ABC. `run(action) -> ActivityResult` is abstract; `plant_file(decky, path, bytes, mode, mtime)` and `read_file(decky, path) -> bytes` default to `NotImplementedError` so drivers without a write transport can opt out cleanly.
- `decnet/orchestrator/drivers/__init__.py::get_driver_for(action)` — factory: `TrafficAction`/`FileAction`/`EditAction``SSHDriver`; `EmailAction``EmailDriver`. Same lazy-import pattern as `decnet.canary.factory`.
- `decnet/orchestrator/drivers/ssh.py::SSHDriver` — concrete docker-exec driver. `plant_file` streams base64 bytes via stdin (ARG_MAX-safe; mirrors `decnet.canary.planter` commit `c17b9e0`) and applies `touch -d` for the realism-sampled mtime so files don't all stamp at wall-clock-now. `read_file` runs `docker exec ... cat <path>` and raises `FileNotFoundError` cleanly. `_run_edit` reads the previous body from `EditAction.previous_body`, calls `realism.bodies.next_iteration`, and re-plants.
- `decnet/orchestrator/drivers/email.py::EmailDriver` — RFC 2822 EML build, IMAP/POP3 spool delivery via `docker exec ... tee`. LLM call goes through `decnet.realism.llm`.
### `decnet/orchestrator/emailgen/`
Email-specific delivery and threading. After stage 5 of the realism migration it has no worker / CLI / systemd unit of its own; the orchestrator drives it.
- `scheduler.py::pick(repo)` — picks a mail decky + sender + recipient, optionally a parent thread. Returns `EmailAction`.
- `events.py` — DB-row + bus-topic builders for `OrchestratorEmail` rows.
- `threads.py` — RFC 2822 thread-chain helpers (Message-ID generation, `Re: ` / `In-Reply-To` bookkeeping).
The persona schema, prompt builder, LLM glue, and global persona pool moved to `decnet/realism/` in stage 2 — see [Module Reference — Core](Module-Reference-Core) for those.
Reads: `repo.list_running_deckies`, `repo.get_topology`, `repo.pick_random_synthetic_file_for_edit`, `repo.list_synthetic_files`. Writes: `repo.record_orchestrator_event`, `repo.record_orchestrator_email`, `repo.record_synthetic_file`, `repo.update_synthetic_file`, `repo.create_canary_token` (via the cultivator).
See [Realism](Realism) for the operator-level walkthrough of content classes, persona pool, work-hours gating, edit-in-place, LLM enrichment, and canary cultivation.

180
Realism.md Normal file

@@ -0,0 +1,180 @@
# Realism
The realism content engine is what makes DECNET deckies look *lived-in*. Without it, a deployed honeypot has a frozen filesystem, mailboxes that never grow, and timestamps clustered at deploy time. Attackers notice. The realism library — `decnet/realism/` — drives the orchestrator's per-tick file plants and email drops so each decky grows files at plausible hours, with persona-conditioned names and bodies, occasionally edited in place, and very rarely seeded with callback-bearing canaries.
This is the operator-facing guide. For the underlying module surface see [Module Reference — Workers § Orchestrator](Module-Reference-Workers#orchestrator--decnetorchestrator).
## Why this exists
Pre-realism, the orchestrator's file plants looked like this on a deployed decky:
```text
$ ls /home/admin/
notes-1777254307.txt notes-1777260507.txt notes-1777266693.txt notes-1777274923.txt
$ cat notes-1777254307.txt
todo: rotate keys; check on backup task
```
Two tells:
- **Filenames are unix epochs.** No real user names a file `notes-1777315854.txt`. They write `notes.txt`, `TODO.md`, `keys.txt`.
- **Identical bodies.** Every `notes-*.txt` had the same one-line content because the generator was three hardcoded templates.
The realism engine fixes both — and adds edit-in-place, diurnal pacing, optional LLM enrichment, and canary cultivation on the same pacing.
## Architecture in one paragraph
The orchestrator ticks every 60 s and rolls a weighted action kind: 45 % SSH traffic, 45 % file plant or edit, 10 % email. The file branch asks the realism planner for a `Plan` (decky, persona, content_class, action, mtime, body hint). The planner enforces a diurnal gate (only personas in their `active_hours` window are considered), weights content classes (user > system > canary), and decides create / edit / leave-alone. The plan flows through the SSH driver, which writes the bytes via base64-on-stdin `docker exec` with a backdated mtime via `touch -d`. After a successful plant or edit the worker persists or patches a `synthetic_files` row so the next tick can edit it again. When LLM enrichment is enabled, user-class bodies get one Ollama round-trip each; on timeout / error / breaker-trip the deterministic template is the fallback.
## Content classes
Every planted artifact maps to exactly one `ContentClass` member (defined in `decnet/realism/taxonomy.py`).
| Class | Category | LLM-eligible | Examples |
|---|---|---|---|
| `note` | user | yes | `~/notes.txt`, `~/scratch.md`, `~/keys.txt` |
| `todo` | user | yes | `~/TODO.md`, `~/todo.txt`, `~/things.md` |
| `draft` | user | yes | `~/Q3-budget-DRAFT.md`, `~/proposal.md` |
| `script` | user | yes | `~/backup.sh`, `~/cleanup.sh`, `~/fix.py` |
| `log_cron` | system | no | `/var/log/cron.log`, `/var/log/cron.log.1`, `/var/log/cron.log.2.gz` |
| `log_daemon` | system | no | `/var/log/daemon.log`, `/var/log/syslog`, `/var/log/auth.log` |
| `cache_tmp` | system | no | `/tmp/.cache-XXXXXX` (mkstemp shape) |
| `email` | email | yes | mail-decky maildir contents |
| `canary_aws_creds` | canary | no | `~/.aws/credentials` (passive) |
| `canary_env_file` | canary | no | `~/app/.env` (HTTP callback) |
| `canary_git_config` | canary | no | `~/.git/config` (HTTP callback) |
| `canary_ssh_key` | canary | no | `~/.ssh/id_rsa` (DNS callback in comment) |
| `canary_honeydoc` | canary | no | `~/Documents/notes.html` (HTTP callback) |
| `canary_honeydoc_docx` | canary | no | `~/Documents/Q3-Operations-Review.docx` (DOCX with remote 1×1 image) |
| `canary_honeydoc_pdf` | canary | no | same as docx, PDF flavour |
| `canary_mysql_dump` | canary | no | `/var/backups/db_backup.sql` (replica-handshake DNS phone-home) |
System-class content is **deliberately** template-only. Real cron logs *are* formulaic — an LLM-authored cron log is more suspicious than a templated one. Canary classes are also template-only because their generators are deterministic by design (re-seeding from the same callback token must produce the same bytes for planter idempotency).
## Personas
Personas are fictional employees the realism engine writes *as*. Each persona carries:
- `name`, `email`, `role` — basic identity.
- `tone``formal` / `direct` / `casual` / `technical` / `custom` — drives the LLM voice.
- `mannerisms` — short list of stylistic ticks; 12 are randomly picked into each prompt.
- `language` — ISO 639-1; the LLM is instructed not to code-switch.
- `active_hours``"HH:MM-HH:MM"`, supports wrap-around (`"22:00-06:00"`). The planner skips a persona outside its window.
- `signature` — optional verbatim block for emails.
- `uses_llms_heavily` — opt-out for the em-dash suppression (see below).
### Two pools
- **Topology pool** — `Topology.email_personas`, edited per topology via the dashboard's Persona Generation page (`/topologies/:id/personas`). MazeNET-topology deckies use this.
- **Global pool** — a JSON file on disk, edited via `/realism/personas` on the dashboard or `decnet realism import-personas <file>` on the CLI. Fleet (MACVLAN/IPVLAN) and SWARM-shard deckies use this. Path resolution: `$DECNET_REALISM_PERSONAS``/etc/decnet/email_personas.json``~/.decnet/email_personas.json`.
Files vary by user (admin vs ubuntu vs service), so a single decky can host files from multiple personas — the planner samples per tick, persists the picked persona on the `synthetic_files` row, and never binds one decky to a single fictional employee.
### Em-dash suppression
Em-dashes (`—`) are a strong stylometric tell for LLM-authored prose. By default the prompt builder instructs the model to avoid them, and a belt-and-braces `strip_em_dashes` substitutes any that slip through. Personas with `uses_llms_heavily=true` opt out — they're meant to look like the kind of person who really does write that way.
## Diurnal gating
Two helpers in `decnet/realism/diurnal.py`:
- `in_work_hours(window, now)` — gate the planner so a persona's files only appear inside the persona's window. Wrap-around is supported. Malformed windows fail open (a typo never silences the whole fleet).
- `sample_mtime(window, now, *, backdate_min_hours=0.5, backdate_max_days=14.0)` — return a backdated `datetime` whose hour-of-day falls inside the window. Drivers pass this to `touch -d` after every plant. The hour-snap is skipped when the candidate already lands in window; when it has to snap, the result is shifted back at least one day so it stays in the past.
Net effect: a `~/TODO.md` planted during admin's 09:0018:00 window will report mtimes inside that window, biased toward "edited recently" but never wall-clock-now.
## Edit-in-place
When the planner picks `action="edit"`, the orchestrator reads the previous body from the `synthetic_files` row, asks `realism.bodies.next_iteration` for a plausible mutation, writes it back with a fresh in-window mtime, and bumps `edit_count + 1`. Per content_class:
- **TODO** — flip an unchecked box to `[x]`, append a new item, or both.
- **Note / draft / script** — append a new line / paragraph / comment.
- **Log_cron / log_daemon** — append a new syslog line (logs are append-only).
Canary classes, `cache_tmp`, and `email` don't support edits — the planner filters them out at candidate-selection time.
## LLM enrichment
Optional. When `DECNET_REALISM_LLM` is set to a non-empty value (`ollama` / `fake` / etc.), the orchestrator builds an `LLMBackend` at startup and passes it through every tick. For user-class file bodies (`note` / `todo` / `draft` / `script`) the worker:
1. Builds a class-conditioned prompt (`decnet/realism/prompts/filebody.py`).
2. Calls `await asyncio.wait_for(llm.generate(prompt), timeout=DECNET_REALISM_TIMEOUT)`.
3. Falls back to the deterministic template on `LLMTimeout`, error, empty output, or non-success.
4. Strips em-dashes (unless persona opted in) on the way out.
System-class content (logs, /tmp caches) and canary classes never invoke the LLM — those are template-only by design.
### Circuit breaker
The per-call timeout protects one tick from one wedged Ollama; the breaker (`decnet/realism/llm/circuit.py`) protects the worker from a *sustained* problem. After 3 consecutive failures it flips open and short-circuits subsequent calls to the template fallback for 60 s, then half-opens to probe — success closes, failure re-opens with a fresh cooldown. State is process-local. Counters reset on any single success.
## Canary cultivation
Roughly **3 %** of file ticks land on a canary class. The cultivator (`decnet/canary/cultivator.py`):
1. Maps the `canary_*` content_class to a generator name (`canary_aws_creds``aws_creds`, `canary_mysql_dump``mysql_dump`, …).
2. Mints a fresh `callback_token` (16 url-safe bytes).
3. Builds a `CanaryContext` from `$DECNET_CANARY_HTTP_BASE` and `$DECNET_CANARY_DNS_ZONE`.
4. Calls the generator for the bytes.
5. Persists a `canary_tokens` row before plant so the canary worker can attribute callbacks even on plant-time previews.
6. Returns a `CanaryArtifact` with the placement path resolved per-class (`~/.aws/credentials`, `~/.ssh/id_rsa`, `/var/backups/db_backup.sql`, …).
Required env: at least `DECNET_CANARY_HTTP_BASE` for HTTP-callback generators, `DECNET_CANARY_DNS_ZONE` for DNS-callback ones (`ssh_key`, `mysql_dump`). Without them the cultivator raises and the orchestrator falls through to a non-canary plan — the tick isn't wasted.
Stealth: the cultivator never adds the `DECNET` literal to artifact bytes. The underlying generators are already stealth-clean. A test asserts the contract holds (`tests/canary/test_cultivator.py::test_cultivate_artifact_does_not_leak_decnet_string`).
### Volume and rate
Canary tokens are real: each carries a real DNS subdomain, a real HTTP slug, a real `canary_tokens` row, and (when tripped) a real alert. The 3 % gate is conservative on purpose — flooding the fleet makes the dashboard noisy and explodes the alert surface. If you want more, edit `_CANARY_PROBABILITY` in `decnet/realism/planner.py`; if you want fewer, do the inverse. There is no per-decky daily cap today (planner-level), but the per-`(decky_uuid, path)` UNIQUE on `synthetic_files` provides natural deduplication.
## Storage
Two tables back this:
- `synthetic_files` — per-`(decky_uuid, path)` row. Carries `persona`, `content_class`, `created_at`, `last_modified`, `edit_count`, `content_hash`, `last_body` (capped at 64 KB). Schema in `decnet/web/db/models/realism.py`.
- `canary_tokens` — existing canary-subsystem table; cultivator writes one row per canary plant.
Two tables already in production receive the orchestrator's per-tick events:
- `orchestrator_events``kind ∈ {"traffic", "file"}`. Includes `EditAction` rows under `kind="file"`, `action="file:edit"`.
- `orchestrator_emails``EmailAction` rows.
## Configuration
| Env var | Default | Effect |
|---|---|---|
| `DECNET_REALISM_LLM` | unset | Backend selector (`ollama` / `fake` / `off`). Unset / `off` / `none` / `0` / `false` / `disabled` disables enrichment; any other value enables. |
| `DECNET_REALISM_MODEL` | `llama3.1` | Ollama model name. |
| `DECNET_REALISM_TIMEOUT` | `60` | Per-call wall-clock cap (seconds). |
| `DECNET_REALISM_PERSONAS` | `/etc/decnet/email_personas.json` | Global pool path override. |
| `DECNET_CANARY_HTTP_BASE` | unset | HTTP callback base (`https://canary.example.test`). |
| `DECNET_CANARY_DNS_ZONE` | unset | DNS zone (`canary.example.test`). |
Per-host overrides go in the orchestrator unit's `EnvironmentFile` (`{install_dir}/.env.local`), see [Systemd-Setup](Systemd-Setup).
## CLI surface
- `decnet orchestrate [--llm/--no-llm]` — the long-running worker. See [CLI Reference § decnet orchestrate](CLI-Reference#decnet-orchestrate).
- `decnet realism import-personas <PATH>` — validate and install the global persona pool. See [CLI Reference § decnet realism import-personas](CLI-Reference#decnet-realism-import-personas).
## Dashboard
The dashboard's **Persona Generation** page edits both pools (per-topology and global). A synthetic-files browser ("files this decky has grown") and an LLM-status panel are open follow-ups; the data is already persisted, just not yet rendered.
## Migration history
The realism library was extracted from the original `decnet/orchestrator/emailgen/` worker in eight stages. Stage notes live in commit messages on `dev`; the highlights:
- Stage 2 — `emailgen/personas`, `emailgen/prompt`, `emailgen/global_pool`, `emailgen/llm/` moved into `decnet/realism/`. Env-var rename `DECNET_EMAILGEN_*``DECNET_REALISM_*` (clean break, pre-v1).
- Stage 4 — `ActivityDriver` ABC + `get_driver_for(action)` factory; `SSHDriver.plant_file` streams base64 via stdin (ARG_MAX-safe), honours `mtime`.
- Stage 5 — service collapse: `decnet-emailgen.service` deleted, `decnet emailgen run` deleted, `EmailAction` joined `TrafficAction` / `FileAction` in the orchestrator's tick. API URL `/api/v1/emailgen/personas``/api/v1/realism/personas`. CLI `decnet emailgen import-personas``decnet realism import-personas`.
For the full story, `git log --oneline | grep realism` on `dev`.
## See also
- [Module Reference — Workers § Orchestrator](Module-Reference-Workers#orchestrator--decnetorchestrator)
- [Service-Bus](Service-Bus#topics) — `orchestrator.{traffic,file,email}.{decky_id}` topics
- [CLI Reference](CLI-Reference#decnet-orchestrate)
- [Environment Variables](Environment-Variables#realism-content-engine)
- [Security and Stealth](Security-and-Stealth) — em-dash policy, no-DECNET-literal contract

@@ -18,12 +18,17 @@ DECNET keeps its forward-looking and backward-looking planning docs inside the m
## Audits and Coverage ## Audits and Coverage
- `development/REALISM_AUDIT.md` — decoy realism audit notes. - `development/REALISM_AUDIT.md` — decoy realism audit notes (pre-migration; see [[Realism]] for the current content-engine surface).
- `development/COVERAGE.md` — test coverage state. - `development/COVERAGE.md` — test coverage state.
- `development/EVENTS.md` — event pipeline and schema notes. - `development/EVENTS.md` — event pipeline and schema notes.
Each of these files lives in the DECNET repo, not this wiki. Follow the links above from a working checkout. Each of these files lives in the DECNET repo, not this wiki. Follow the links above from a working checkout.
## Recently closed
- **Orchestrator file generation looked obviously fake** — the pre-realism `decnet orchestrate` shipped three hardcoded templates that produced epoch-suffixed filenames (`notes-1777315854.txt`) with identical bodies. Closed by the realism migration (8 stages on `dev`, summarised in [[Realism#migration-history]]). Files now use plausible names, persona-conditioned bodies, edit-in-place over time, diurnal-sampled mtimes, optional LLM enrichment with a process-local circuit breaker, and 3 % canary cultivation on the same pacing.
- **Two workers for one shape** — `decnet-emailgen.service` was a sibling worker doing the same tick-driven work as `decnet orchestrate`. The migration collapsed it; one fewer systemd unit (down to 20).
--- ---
See also: [[Home]] · [[Developer-Guide]] · [[Troubleshooting]] See also: [[Home]] · [[Developer-Guide]] · [[Troubleshooting]]

@@ -148,6 +148,11 @@ Current topic families:
| `topology.{id}.status` | Mutator | `{state, reason}` | | `topology.{id}.status` | Mutator | `{state, reason}` |
| `decky.{id}.state` | _reserved_ | — | | `decky.{id}.state` | _reserved_ | — |
| `decky.{id}.traffic` | _reserved_ | — | | `decky.{id}.traffic` | _reserved_ | — |
| `orchestrator.traffic.{decky_id}` | Orchestrator | `{kind: "traffic", protocol: "ssh", action, src_decky_uuid, dst_decky_uuid, success, payload, ts}` — synthetic inter-decky SSH traffic generated to keep the fleet from looking suspiciously static |
| `orchestrator.file.{decky_id}` | Orchestrator | `{kind: "file", protocol: "ssh", action, dst_decky_uuid, success, payload, ts}` — synthetic file create or edit (`action="file:create"` / `"file:edit"`) performed inside a decky via `docker exec`. Driven by the realism planner; rare ticks (~3%) carry callback-bearing canary content. |
| `orchestrator.email.{decky_id}` | Orchestrator | `{kind: "email", mail_decky_uuid, thread_id, message_id, in_reply_to, sender_email, recipient_email, subject, language, success, ts}` — one fake corporate email persona-driven by the realism content engine and dropped into a mail decky's spool. `decky_id` is the mail decky (the IMAP/POP3 host serving the mailbox). Producer changed from a separate `emailgen` worker to the unified orchestrator in the realism migration; topic shape is unchanged. |
| `system.orchestrator.health` | Orchestrator | standard worker heartbeat (covers traffic + file + email branches) |
| `system.orchestrator.control` | Orchestrator | admin-originated stop intents for the orchestrator loop |
| `attacker.observed` | Correlator | first sighting; consumed by `decnet enrich` as a wake signal | | `attacker.observed` | Correlator | first sighting; consumed by `decnet enrich` as a wake signal |
| `attacker.scored` | Profiler | post-enrichment score update; also wakes `decnet enrich` | | `attacker.scored` | Profiler | post-enrichment score update; also wakes `decnet enrich` |
| `attacker.intel.enriched` | `decnet enrich` | `{attacker_ip, aggregate_verdict, providers}` after a threat-intel pass; webhook → SIEM | | `attacker.intel.enriched` | `decnet enrich` | `{attacker_ip, aggregate_verdict, providers}` after a threat-intel pass; webhook → SIEM |
@@ -160,6 +165,10 @@ Current topic families:
| `campaign.identity.assigned` | Campaign clusterer | `{campaign_uuid, identity_uuid}` — identity attached / re-attached to a campaign | | `campaign.identity.assigned` | Campaign clusterer | `{campaign_uuid, identity_uuid}` — identity attached / re-attached to a campaign |
| `campaign.merged` | Campaign clusterer | `{winner_uuid, loser_uuid, identity_uuids: [...]}` — two campaigns collapsed; subscribers re-key cached references to the winner | | `campaign.merged` | Campaign clusterer | `{winner_uuid, loser_uuid, identity_uuids: [...]}` — two campaigns collapsed; subscribers re-key cached references to the winner |
| `campaign.unmerged` | Campaign clusterer | `{resurrected_uuid, former_winner_uuid, identity_uuids: [...]}` — revocable-merge undo at the campaign layer | | `campaign.unmerged` | Campaign clusterer | `{resurrected_uuid, former_winner_uuid, identity_uuids: [...]}` — revocable-merge undo at the campaign layer |
| `canary.{token_id}.placed` | Planter (API + deploy hook) | `{token_id, decky_id, kind, instrumenter?, placement_path, placed_at}` — a canary artifact was successfully written into a decky's filesystem (or a passive token persisted) |
| `canary.{token_id}.triggered` | `decnet canary` worker | `{token_id, decky_id, src_ip, user_agent?, request_path?, dns_qname?, occurred_at, raw_headers?}` — attacker hit the HTTP slug or DNS subdomain; correlator + webhook fanout consume to attribute and forward |
| `canary.{token_id}.revoked` | API (`DELETE /tokens/{id}`) | `{token_id, decky_id, revoked_at}` — operator removed a token; subscribers may evict cached lookups by token id |
| `system.canary.health` | `decnet canary` worker | standard worker heartbeat |
| `system.log` | _reserved_ | — | | `system.log` | _reserved_ | — |
| `system.bus.health` | Bus worker heartbeat | `{ts, uptime_s}` | | `system.bus.health` | Bus worker heartbeat | `{ts, uptime_s}` |

@@ -330,6 +330,62 @@ WantedBy=multi-user.target
Required env: `DECNET_SYSTEM_LOGS` plus any per-service override Required env: `DECNET_SYSTEM_LOGS` plus any per-service override
files referenced by the mutator. files referenced by the mutator.
### `/etc/systemd/system/decnet-orchestrator.service`
Single worker that injects synthetic life into the running fleet —
inter-decky SSH traffic, persona-driven file plants and edits, fake
corporate email drops, occasional canary cultivation. After the
realism migration this one unit covers what `decnet-orchestrator`
and `decnet-emailgen` did separately. The bundled template lives at
`deploy/decnet-orchestrator.service.j2` and includes
`SupplementaryGroups=docker` (the worker drives `docker exec`
against decky containers) plus the realism env block.
```ini
[Unit]
Description=DECNET Orchestrator (synthetic life — traffic + file plants + email)
After=network-online.target decnet-bus.service decnet-api.service
Wants=decnet-bus.service decnet-api.service
[Service]
Type=simple
User=decnet
Group=decnet
WorkingDirectory=/opt/DECNET
EnvironmentFile=-/opt/DECNET/.env.local
Environment=DECNET_SYSTEM_LOGS=/var/log/decnet/decnet.orchestrator.log
# LLM enrichment — opt-in. Leave DECNET_REALISM_LLM unset / empty to
# stay on deterministic templates. Any non-empty value enables.
Environment=DECNET_REALISM_LLM=
Environment=DECNET_REALISM_MODEL=llama3.1
Environment=DECNET_REALISM_TIMEOUT=60
Environment=DECNET_REALISM_PERSONAS=/etc/decnet/email_personas.json
ExecStart=/opt/DECNET/.venv/bin/decnet orchestrate
# docker-exec drives decky writes; needs the docker group.
SupplementaryGroups=docker
NoNewPrivileges=yes
ProtectSystem=full
ProtectHome=read-only
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
```
The realism content engine is documented separately in
[[Realism]]. Set `DECNET_CANARY_HTTP_BASE` and
`DECNET_CANARY_DNS_ZONE` in `.env.local` to enable callback-bearing
canary cultivation; without them the cultivator falls back to
non-canary plans for generators that need a callback host.
There is no separate `decnet-emailgen.service` after the migration.
If your deploy still references one, drop the entry from
`decnet.target` and remove the unit file.
## 7. Enable and start ## 7. Enable and start
```bash ```bash
@@ -340,7 +396,8 @@ sudo systemctl enable --now \
decnet-profiler.service \ decnet-profiler.service \
decnet-sniffer.service \ decnet-sniffer.service \
decnet-collect.service \ decnet-collect.service \
decnet-mutate.service decnet-mutate.service \
decnet-orchestrator.service
``` ```
Check status and tail logs: Check status and tail logs:

@@ -27,6 +27,7 @@
- [Systemd-Setup](Systemd-Setup) - [Systemd-Setup](Systemd-Setup)
- [Logging-and-Syslog](Logging-and-Syslog) - [Logging-and-Syslog](Logging-and-Syslog)
- [Service-Bus](Service-Bus) - [Service-Bus](Service-Bus)
- [Realism](Realism)
- [Web-Dashboard](Web-Dashboard) - [Web-Dashboard](Web-Dashboard)
- [REST-API-Reference](REST-API-Reference) - [REST-API-Reference](REST-API-Reference)
- [Mutation-and-Randomization](Mutation-and-Randomization) - [Mutation-and-Randomization](Mutation-and-Randomization)