diff --git a/CLI-Reference.md b/CLI-Reference.md index 937a733..6a7535e 100644 --- a/CLI-Reference.md +++ b/CLI-Reference.md @@ -390,6 +390,79 @@ sudo decnet sniffer --daemon --- +## decnet orchestrate + +`decnet/cli/orchestrator.py:10` + +Run the orchestrator worker — the long-lived loop that injects +synthetic life into the running fleet (inter-decky SSH traffic, file +plants and edits, fake corporate email drops). After the realism +migration this single command covers what `decnet orchestrate` and +`decnet emailgen run` did separately; both `decnet-emailgen.service` +and the standalone CLI are gone. + +**Usage:** `decnet orchestrate [flags]` + +**Flags:** + +| Flag | Type | Default | Description | +|---|---|---|---| +| `--interval`, `-i` | int (seconds) | `60` | Time between ticks. Each tick rolls one action across the weighted set `traffic` (45%) / `file` (45%) / `email` (10%). | +| `--daemon`, `-d` | bool | `False` | Detach to background. Skip under systemd; the unit supervises directly. | +| `--llm` / `--no-llm` | bool | env-driven | Enable or disable LLM enrichment of user-class file bodies. Default reads `$DECNET_REALISM_LLM` (any non-empty / non-`off` value enables). When the LLM is unreachable, a process-local circuit breaker trips after 3 consecutive failures and the worker falls back to deterministic templates for 60 s. | + +**Examples:** + +```bash +decnet orchestrate # 60s tick, env decides LLM +decnet orchestrate --interval 30 # double the rate +decnet orchestrate --no-llm # force template-only, ignore env +DECNET_REALISM_LLM=ollama decnet orchestrate +``` + +See [Realism](Realism) for content classes, the persona pool, and +how canary cultivation hooks into the same planner. + +--- + +## decnet realism + +`decnet/cli/realism.py:25` + +Maintenance commands for the realism content engine. The only +sub-command currently is `import-personas`. There is no `decnet +realism run` — the long-lived worker is `decnet orchestrate`. + +### decnet realism import-personas + +Validate and install a JSON file as the host-wide global persona +pool. Used for fleet (MACVLAN/IPVLAN) and SWARM-shard deckies that +have no parent topology row. MazeNET-topology deckies use +`Topology.email_personas` instead. + +**Usage:** `decnet realism import-personas [--output PATH]` + +**Flags:** + +| Flag | Type | Default | Description | +|---|---|---|---| +| `--output`, `-o` | path | resolved global pool | Override the destination. Defaults to `$DECNET_REALISM_PERSONAS`, then `/etc/decnet/email_personas.json`, then `~/.decnet/email_personas.json`. | + +**Examples:** + +```bash +decnet realism import-personas ./personas.json +decnet realism import-personas ./personas.json -o ~/.decnet/email_personas.json +``` + +The validator parses every entry into the `EmailPersona` schema +(`decnet/realism/personas.py`), drops invalid entries with a +warning, refuses to write when no entries are valid, and warns when +fewer than two entries land (the email path needs at least two for +sender/recipient pairs). Master-only — gated by `DECNET_MODE=master`. + +--- + ## decnet db-reset `decnet/cli.py:930` @@ -430,5 +503,11 @@ DECNET_DB_URL=mysql+asyncmy://... decnet db-reset --i-know-what-im-doing | `DECNET_DB_TYPE` | `db-reset` | `mysql` or `sqlite`. | | `DECNET_DB_URL` | `db-reset` | Full async DSN. | | `DECNET_DB_HOST/PORT/NAME/USER/PASSWORD` | `db-reset` | Fallback DSN components. | +| `DECNET_REALISM_LLM` | `orchestrate` | LLM backend selector (`ollama` / `fake` / `off`). Default: empty (LLM disabled, templates only). | +| `DECNET_REALISM_MODEL` | `orchestrate` | Model name for the Ollama backend. Default: `llama3.1`. | +| `DECNET_REALISM_TIMEOUT` | `orchestrate` | Per-call wall-clock cap. Default: `60`. | +| `DECNET_REALISM_PERSONAS` | `orchestrate`, `realism import-personas` | Global persona pool path override. | +| `DECNET_CANARY_HTTP_BASE` | `orchestrate` (canary cultivator), `canary` worker | HTTP callback base URL for cultivated canary artifacts. | +| `DECNET_CANARY_DNS_ZONE` | `orchestrate` (canary cultivator), `canary` worker | DNS zone for callback subdomains. | See `env.config.example` at the repo root for full defaults. diff --git a/Design-Overview.md b/Design-Overview.md index 1fe3831..ed3ffcb 100644 --- a/Design-Overview.md +++ b/Design-Overview.md @@ -15,6 +15,8 @@ DECNET runs as a small constellation of workers around a FastAPI process. Each w | Sniffer | `decnet sniffer --daemon` | `DECNET_EMBED_SNIFFER=1` | Passive PCAP on the decoy bridge | | Prober | `decnet probe --daemon` | always runs | Active realism checks | | Mutator | `decnet mutate --daemon --watch` | always runs | Runtime fleet mutation | +| Orchestrator | `decnet orchestrate` | systemd unit | Synthetic life — inter-decky SSH traffic, persona-driven file plants and edits, fake corporate email drops, occasional canary cultivation. Driven by [[Realism]]. Pre-realism this was three workers (`orchestrator` + `emailgen` + an unimplemented file driver); the migration collapsed them into one. | +| Canary | `decnet canary` | systemd unit | DNS + HTTP listeners that catch attacker callbacks from planted canary tokens. | Every worker is also how `decnet deploy` spawns them — the deploy path shells out to `python -m decnet.cli --daemon` so there is exactly one code path, whether you run interactively or under systemd. @@ -82,3 +84,4 @@ If you are chasing a bug across subsystem boundaries, start from those. - [[Database-Drivers]] — SQLite vs MySQL. - [[Environment-Variables]] — the full env surface. - [[Systemd-Setup]] — running each worker as a supervised unit. +- [[Realism]] — the content engine driving the orchestrator's file + email branches. diff --git a/Environment-Variables.md b/Environment-Variables.md index ab34641..420ad0d 100644 --- a/Environment-Variables.md +++ b/Environment-Variables.md @@ -130,6 +130,39 @@ Example override: DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com ``` +## Realism content engine + +The orchestrator's per-tick file plants and email drops are driven by +[`decnet/realism/`](Realism). LLM enrichment for user-class file +bodies (notes, TODOs, drafts, scripts) is opt-in via these env vars; +when unset / empty / `off`, the orchestrator falls back to +deterministic templates and never calls an LLM. + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_REALISM_LLM` | string (`ollama` / `fake` / `off` / unset) | unset | No | Picks the LLM backend in `decnet/realism/llm/factory.py`. Empty / `off` / `none` / `0` / `false` / `disabled` disables enrichment; any other value enables. The orchestrator process-local circuit breaker trips after 3 consecutive failures and falls back to templates for 60 s. | +| `DECNET_REALISM_MODEL` | string | `llama3.1` | No | Model name passed to `ollama run`. Override per-host in the orchestrator unit's `EnvironmentFile`. | +| `DECNET_REALISM_TIMEOUT` | float seconds | `60` | No | Per-call wall-clock cap on `ollama run`. Hitting this raises `LLMTimeout`, which counts as one failure for the circuit breaker. | +| `DECNET_REALISM_PERSONAS` | path | `/etc/decnet/email_personas.json` | No | Override path for the host-wide persona pool consumed by fleet (MACVLAN/IPVLAN) and SWARM-shard deckies. MazeNET-topology deckies use `Topology.email_personas` instead. Path is also resolved by `decnet realism import-personas`. | +| `DECNET_REALISM_FAKE_OUTPUT` | string | (canned email body) | No | Override the canned text the `fake` backend returns. Only useful in integration tests. | + +The renamed predecessors `DECNET_EMAILGEN_LLM` / `_MODEL` / `_TIMEOUT` +/ `_PERSONAS` / `_FAKE_OUTPUT` are no longer read; pre-v1 clean break. + +## Canary worker + +The canary worker (`decnet canary`) and the realism canary cultivator +both read these to know how to embed callbacks in planted artifacts. +When both are unset, the cultivator can still produce passive bait +(e.g. `aws_creds`) but raises for generators that require a callback +host (`ssh_key` DNS, `mysql_dump`). + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_CANARY_HTTP_BASE` | URL (no trailing slash) | unset | No | Public-facing base for HTTP callback URLs (e.g. `https://canary.example.test`). Generators embed `/c/` into artifacts. | +| `DECNET_CANARY_DNS_ZONE` | DNS zone | unset | No | DNS zone for callback subdomains (e.g. `canary.example.test`). Generators embed `.` for DNS-trip artifacts (`ssh_key` comments, `mysql_dump` replica `SOURCE_HOST`). | +| `DECNET_CANARY_BLOB_DIR` | path | `/var/lib/decnet/canary/blobs` | No | On-disk store for operator-uploaded canary blobs, deduplicated by sha256. | + ## Starter `.env.local` Copy this to the project root as `.env.local`, change every placeholder, and @@ -181,6 +214,16 @@ DECNET_DB_TYPE=sqlite # CORS (only needed when the browser is not on the same host:port as the API) # DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com + +# Realism content engine (LLM enrichment is opt-in; templates are the fallback) +# DECNET_REALISM_LLM=ollama +# DECNET_REALISM_MODEL=llama3.1 +# DECNET_REALISM_TIMEOUT=60 +DECNET_REALISM_PERSONAS=/etc/decnet/email_personas.json + +# Canary worker (set both to enable callback-bearing artifact generators) +# DECNET_CANARY_HTTP_BASE=https://canary.example.test +# DECNET_CANARY_DNS_ZONE=canary.example.test ``` ## Notes diff --git a/Home.md b/Home.md index f7458be..ae2eb7c 100644 --- a/Home.md +++ b/Home.md @@ -33,6 +33,7 @@ Supported Python: 3.11, 3.12, 3.13. Python 3.14 is **not** supported — its new - [Database-Drivers](Database-Drivers) - [Systemd-Setup](Systemd-Setup) - [Logging-and-Syslog](Logging-and-Syslog) +- [Realism](Realism) - [Web-Dashboard](Web-Dashboard) - [REST-API-Reference](REST-API-Reference) - [Mutation-and-Randomization](Mutation-and-Randomization) diff --git a/Module-Reference-Core.md b/Module-Reference-Core.md index 3799a1d..bfc4916 100644 --- a/Module-Reference-Core.md +++ b/Module-Reference-Core.md @@ -202,6 +202,29 @@ Shared builder functions for constructing a list of `DeckyConfig` from either CL --- +## `decnet/realism/` + +Shared content engine driving the orchestrator's per-tick file plants and email drops. Importable library, no worker / systemd unit / CLI of its own — the long-lived loop that calls into it is `decnet/orchestrator/worker.py`. See [[Realism]] for the operator-facing walkthrough. + +- `decnet/realism/taxonomy.py::ContentClass` — StrEnum of every artifact class the planner can pick: user (`note` / `todo` / `draft` / `script`), system (`log_cron` / `log_daemon` / `cache_tmp`), `email`, and 8 `canary_*` variants. Wire-visible — values are persisted on `synthetic_files.content_class` and used as bus-event discriminants. +- `decnet/realism/taxonomy.py::Plan` — frozen dataclass `(decky_uuid, decky_name, persona, content_class, action, target_path, mtime, body_hint, previous_body, notes)`. Construction validates that `action="edit"` carries a `previous_body`. +- `decnet/realism/diurnal.py::in_work_hours(window, now)` — wrap-around-aware `"HH:MM-HH:MM"` membership; fail-open on parse error. +- `decnet/realism/diurnal.py::sample_mtime(window, now, *, backdate_min_hours=0.5, backdate_max_days=14.0)` — backdated `datetime` whose hour-of-day falls in *window*. Drivers pass to `touch -d`. +- `decnet/realism/planner.py::pick(deckies, now, *, edit_candidate=None, rand=None)` — sync, pure-of-DB. Decides create / edit / leave-alone (10 %), weights user / system content (70 / 30 split), and at the per-pair level rolls a 3 % canary gate. Returns `Plan` or `None`. +- `decnet/realism/naming.py::make_path(content_class, persona, *, rand=None)` — per-class filename templates. Anti-regression: no namer is allowed to drop a raw decimal timestamp into a filename. +- `decnet/realism/bodies.py::make_body(content_class, persona, *, rand=None)` — deterministic body templates per class. The async sibling `make_body_with_llm(...)` calls an `LLMBackend` for user-classes (with em-dash stripping); falls back to the template on timeout / error / empty. +- `decnet/realism/bodies.py::next_iteration(content_class, persona, previous_body, *, rand=None)` — read-modify-write mutator for `EditAction`. Append-only for logs; flip-or-append for TODOs; append-line for notes / drafts / scripts. +- `decnet/realism/personas.py::EmailPersona` — pydantic model used by both file and email content. Fields: `name`, `email`, `role`, `tone`, `mannerisms`, `language`, `signature`, `active_hours`, `reply_latency`, `uses_llms_heavily`. The class name retains "Email" because every persona today has a mandatory email field, but it owns *all* persona-driven realism. +- `decnet/realism/personas_pool.py::load(*, language_default="en")` / `resolve_path()` / `reset_cache()` — JSON pool resolved from `$DECNET_REALISM_PERSONAS` → `/etc/decnet/email_personas.json` → `~/.decnet/email_personas.json`. Mtime-cached. +- `decnet/realism/llm/base.py::LLMBackend` — minimal async one-shot Protocol. `decnet/realism/llm/factory.py::get_llm` dispatches on `DECNET_REALISM_LLM` (`ollama` / `fake`). `decnet/realism/llm/circuit.py::LLMCircuitBreaker` — process-local breaker, default 3 failures → open, 60 s cooldown. +- `decnet/realism/prompts/email.py::build` — corporate-email prompt builder (lifted from the original `orchestrator.emailgen.prompt`). +- `decnet/realism/prompts/filebody.py::build` — class-conditioned prompts for user-class file bodies. +- `decnet/realism/prompts/_style.py::em_dash_rule(persona)` / `strip_em_dashes(text, persona)` — shared stylometric guard. Persona `uses_llms_heavily=True` opts out. + +The persona schema, prompt builder, LLM glue, and global persona pool moved here from `decnet/orchestrator/emailgen/` in stage 2 of the realism migration. Every importer was updated; the env-var rename `DECNET_EMAILGEN_*` → `DECNET_REALISM_*` is a clean break — the predecessors are no longer read. + +--- + ## `decnet/telemetry.py` OpenTelemetry integration that is strictly opt-in via `DECNET_DEVELOPER_TRACING=true`. When disabled, every public export is a zero-cost no-op: `@traced` returns the unwrapped function, `get_tracer()` returns a `_NoOpTracer`, the repository wrapper returns the original repo, and injection/extraction of trace context is a nop. diff --git a/Module-Reference-Workers.md b/Module-Reference-Workers.md index ba65f18..b1dc7c8 100644 --- a/Module-Reference-Workers.md +++ b/Module-Reference-Workers.md @@ -435,3 +435,63 @@ Reads: nothing persistent before the first update; afterwards, the release direc ### `decnet/updater/routes/` Reserved for handler splits once the app grows. All routes currently live in `app.py`. + +--- + +## Orchestrator — `decnet/orchestrator/` + +Single async worker that injects synthetic life into the running fleet. After the realism migration (stages 1–7) it owns three action shapes: + +- **traffic** — `TrafficAction`. Inter-decky SSH banner probe via `docker exec` against the source decky's ssh container. +- **file** — `FileAction` / `EditAction`. Plant or edit-in-place a file on a destination decky. Driven by [`decnet.realism.planner`](Realism); covers inert content (notes, TODOs, drafts, scripts, cron / daemon logs, /tmp caches) and rare callback-bearing canary cultivation. +- **email** — `EmailAction`. Persona-driven RFC 2822 email dropped into a mail decky's IMAP/POP3 spool. + +`decnet emailgen run` and `decnet-emailgen.service` are gone — folded into this worker by stage 5 of the realism migration. + +### `decnet/orchestrator/__init__.py` + +Re-exports `orchestrator_worker`. + +### `decnet/orchestrator/worker.py` + +- `decnet/orchestrator/worker.py::orchestrator_worker` — async entry point. Default `interval=60`. Constructs an `LLMBackend` via `decnet.realism.llm.get_llm()` when `DECNET_REALISM_LLM` is enabled, plus an `LLMCircuitBreaker` (3 failures → open, 60 s cooldown). Honours `system.orchestrator.control` for graceful shutdown. +- `decnet/orchestrator/worker.py::_one_tick` — one action pick + one driver invocation + one DB write + one fire-and-forget bus publish. +- `decnet/orchestrator/worker.py::_pick_action` — rolls the action kind (`_ACTION_WEIGHTS = (("traffic", 45), ("file", 45), ("email", 10))`) and delegates to `scheduler.pick`, `scheduler.pick_file`, or `email_scheduler.pick`. Quiet branches fall through to the other two so a (decky-set, persona-pool, mail-decky) shape that silences one branch doesn't waste the tick. +- `decnet/orchestrator/worker.py::_periodic_prune` — every `_PRUNE_EVERY_TICKS=100` ticks, calls `repo.prune_orchestrator_events(per_dst_cap=10000)` and `repo.prune_orchestrator_emails(per_decky_cap=5000)`. +- `decnet/orchestrator/worker.py::_record_synthetic_file` — after a successful FileAction plant, writes a row to `synthetic_files` keyed `(decky_uuid, path)`. On the unique-constraint collision (same path re-planted), patches the existing row's `last_modified` / `content_hash` / `last_body` and bumps `edit_count`. +- `decnet/orchestrator/worker.py::_bump_synthetic_file_after_edit` — after a successful EditAction, bumps `edit_count + 1`, refreshes `content_hash`, stores the new body. No-op when the candidate row was pruned mid-flight. +- `decnet/orchestrator/worker.py::_llm_should_enable` — resolves the LLM-enabled flag from the CLI flag, env var, defaults. + +### `decnet/orchestrator/scheduler.py` + +- `TrafficAction`, `FileAction`, `EditAction` dataclasses. `FileAction.content_bytes` carries binary canary artifact bytes (DOCX/PDF) so the SSH driver doesn't utf-8 round-trip them. +- `decnet/orchestrator/scheduler.py::pick` — sync, traffic-only. Returns a `TrafficAction` for ≥ 2 SSH-capable deckies or `None`. +- `decnet/orchestrator/scheduler.py::pick_file` — async. Resolves personas per decky (topology pool when source is `topology`; global `realism.personas_pool` otherwise), pre-fetches an edit candidate from `synthetic_files` ~50 % of ticks, asks `realism.planner.pick` to choose between create / edit / leave-alone, maps the resulting `Plan` to a `FileAction` (create) or `EditAction` (edit). When the picked content_class is canary, dispatches through `decnet.canary.cultivator.cultivate` and packs the bytes into `FileAction.content_bytes`. +- `decnet/orchestrator/scheduler.py::_resolve_personas` — attaches `_realism_personas` to each decky dict before passing to the planner. Topology-source deckies pull from `Topology.email_personas`; fleet/shard from the global pool. + +### `decnet/orchestrator/events.py` + +- `to_row(action, result)` — builds the `OrchestratorEvent(**...)` kwargs. `kind` is `"traffic"` for `TrafficAction`, `"file"` for `FileAction` and `EditAction`. `EmailAction` rows go to `OrchestratorEmail` via `decnet.orchestrator.emailgen.events.to_row`. +- `topic_for(action)` — `orchestrator.{traffic|file}.{dst_uuid}`. +- `event_type_for(action)` — discriminator string for SSE replay. + +### `decnet/orchestrator/drivers/` + +- `decnet/orchestrator/drivers/base.py::ActivityDriver` — ABC. `run(action) -> ActivityResult` is abstract; `plant_file(decky, path, bytes, mode, mtime)` and `read_file(decky, path) -> bytes` default to `NotImplementedError` so drivers without a write transport can opt out cleanly. +- `decnet/orchestrator/drivers/__init__.py::get_driver_for(action)` — factory: `TrafficAction`/`FileAction`/`EditAction` → `SSHDriver`; `EmailAction` → `EmailDriver`. Same lazy-import pattern as `decnet.canary.factory`. +- `decnet/orchestrator/drivers/ssh.py::SSHDriver` — concrete docker-exec driver. `plant_file` streams base64 bytes via stdin (ARG_MAX-safe; mirrors `decnet.canary.planter` commit `c17b9e0`) and applies `touch -d` for the realism-sampled mtime so files don't all stamp at wall-clock-now. `read_file` runs `docker exec ... cat ` and raises `FileNotFoundError` cleanly. `_run_edit` reads the previous body from `EditAction.previous_body`, calls `realism.bodies.next_iteration`, and re-plants. +- `decnet/orchestrator/drivers/email.py::EmailDriver` — RFC 2822 EML build, IMAP/POP3 spool delivery via `docker exec ... tee`. LLM call goes through `decnet.realism.llm`. + +### `decnet/orchestrator/emailgen/` + +Email-specific delivery and threading. After stage 5 of the realism migration it has no worker / CLI / systemd unit of its own; the orchestrator drives it. + +- `scheduler.py::pick(repo)` — picks a mail decky + sender + recipient, optionally a parent thread. Returns `EmailAction`. +- `events.py` — DB-row + bus-topic builders for `OrchestratorEmail` rows. +- `threads.py` — RFC 2822 thread-chain helpers (Message-ID generation, `Re: ` / `In-Reply-To` bookkeeping). + +The persona schema, prompt builder, LLM glue, and global persona pool moved to `decnet/realism/` in stage 2 — see [Module Reference — Core](Module-Reference-Core) for those. + +Reads: `repo.list_running_deckies`, `repo.get_topology`, `repo.pick_random_synthetic_file_for_edit`, `repo.list_synthetic_files`. Writes: `repo.record_orchestrator_event`, `repo.record_orchestrator_email`, `repo.record_synthetic_file`, `repo.update_synthetic_file`, `repo.create_canary_token` (via the cultivator). + +See [Realism](Realism) for the operator-level walkthrough of content classes, persona pool, work-hours gating, edit-in-place, LLM enrichment, and canary cultivation. diff --git a/Realism.md b/Realism.md new file mode 100644 index 0000000..a8f65b0 --- /dev/null +++ b/Realism.md @@ -0,0 +1,180 @@ +# Realism + +The realism content engine is what makes DECNET deckies look *lived-in*. Without it, a deployed honeypot has a frozen filesystem, mailboxes that never grow, and timestamps clustered at deploy time. Attackers notice. The realism library — `decnet/realism/` — drives the orchestrator's per-tick file plants and email drops so each decky grows files at plausible hours, with persona-conditioned names and bodies, occasionally edited in place, and very rarely seeded with callback-bearing canaries. + +This is the operator-facing guide. For the underlying module surface see [Module Reference — Workers § Orchestrator](Module-Reference-Workers#orchestrator--decnetorchestrator). + +## Why this exists + +Pre-realism, the orchestrator's file plants looked like this on a deployed decky: + +```text +$ ls /home/admin/ +notes-1777254307.txt notes-1777260507.txt notes-1777266693.txt notes-1777274923.txt +$ cat notes-1777254307.txt +todo: rotate keys; check on backup task +``` + +Two tells: + +- **Filenames are unix epochs.** No real user names a file `notes-1777315854.txt`. They write `notes.txt`, `TODO.md`, `keys.txt`. +- **Identical bodies.** Every `notes-*.txt` had the same one-line content because the generator was three hardcoded templates. + +The realism engine fixes both — and adds edit-in-place, diurnal pacing, optional LLM enrichment, and canary cultivation on the same pacing. + +## Architecture in one paragraph + +The orchestrator ticks every 60 s and rolls a weighted action kind: 45 % SSH traffic, 45 % file plant or edit, 10 % email. The file branch asks the realism planner for a `Plan` (decky, persona, content_class, action, mtime, body hint). The planner enforces a diurnal gate (only personas in their `active_hours` window are considered), weights content classes (user > system > canary), and decides create / edit / leave-alone. The plan flows through the SSH driver, which writes the bytes via base64-on-stdin `docker exec` with a backdated mtime via `touch -d`. After a successful plant or edit the worker persists or patches a `synthetic_files` row so the next tick can edit it again. When LLM enrichment is enabled, user-class bodies get one Ollama round-trip each; on timeout / error / breaker-trip the deterministic template is the fallback. + +## Content classes + +Every planted artifact maps to exactly one `ContentClass` member (defined in `decnet/realism/taxonomy.py`). + +| Class | Category | LLM-eligible | Examples | +|---|---|---|---| +| `note` | user | yes | `~/notes.txt`, `~/scratch.md`, `~/keys.txt` | +| `todo` | user | yes | `~/TODO.md`, `~/todo.txt`, `~/things.md` | +| `draft` | user | yes | `~/Q3-budget-DRAFT.md`, `~/proposal.md` | +| `script` | user | yes | `~/backup.sh`, `~/cleanup.sh`, `~/fix.py` | +| `log_cron` | system | no | `/var/log/cron.log`, `/var/log/cron.log.1`, `/var/log/cron.log.2.gz` | +| `log_daemon` | system | no | `/var/log/daemon.log`, `/var/log/syslog`, `/var/log/auth.log` | +| `cache_tmp` | system | no | `/tmp/.cache-XXXXXX` (mkstemp shape) | +| `email` | email | yes | mail-decky maildir contents | +| `canary_aws_creds` | canary | no | `~/.aws/credentials` (passive) | +| `canary_env_file` | canary | no | `~/app/.env` (HTTP callback) | +| `canary_git_config` | canary | no | `~/.git/config` (HTTP callback) | +| `canary_ssh_key` | canary | no | `~/.ssh/id_rsa` (DNS callback in comment) | +| `canary_honeydoc` | canary | no | `~/Documents/notes.html` (HTTP callback) | +| `canary_honeydoc_docx` | canary | no | `~/Documents/Q3-Operations-Review.docx` (DOCX with remote 1×1 image) | +| `canary_honeydoc_pdf` | canary | no | same as docx, PDF flavour | +| `canary_mysql_dump` | canary | no | `/var/backups/db_backup.sql` (replica-handshake DNS phone-home) | + +System-class content is **deliberately** template-only. Real cron logs *are* formulaic — an LLM-authored cron log is more suspicious than a templated one. Canary classes are also template-only because their generators are deterministic by design (re-seeding from the same callback token must produce the same bytes for planter idempotency). + +## Personas + +Personas are fictional employees the realism engine writes *as*. Each persona carries: + +- `name`, `email`, `role` — basic identity. +- `tone` — `formal` / `direct` / `casual` / `technical` / `custom` — drives the LLM voice. +- `mannerisms` — short list of stylistic ticks; 1–2 are randomly picked into each prompt. +- `language` — ISO 639-1; the LLM is instructed not to code-switch. +- `active_hours` — `"HH:MM-HH:MM"`, supports wrap-around (`"22:00-06:00"`). The planner skips a persona outside its window. +- `signature` — optional verbatim block for emails. +- `uses_llms_heavily` — opt-out for the em-dash suppression (see below). + +### Two pools + +- **Topology pool** — `Topology.email_personas`, edited per topology via the dashboard's Persona Generation page (`/topologies/:id/personas`). MazeNET-topology deckies use this. +- **Global pool** — a JSON file on disk, edited via `/realism/personas` on the dashboard or `decnet realism import-personas ` on the CLI. Fleet (MACVLAN/IPVLAN) and SWARM-shard deckies use this. Path resolution: `$DECNET_REALISM_PERSONAS` → `/etc/decnet/email_personas.json` → `~/.decnet/email_personas.json`. + +Files vary by user (admin vs ubuntu vs service), so a single decky can host files from multiple personas — the planner samples per tick, persists the picked persona on the `synthetic_files` row, and never binds one decky to a single fictional employee. + +### Em-dash suppression + +Em-dashes (`—`) are a strong stylometric tell for LLM-authored prose. By default the prompt builder instructs the model to avoid them, and a belt-and-braces `strip_em_dashes` substitutes any that slip through. Personas with `uses_llms_heavily=true` opt out — they're meant to look like the kind of person who really does write that way. + +## Diurnal gating + +Two helpers in `decnet/realism/diurnal.py`: + +- `in_work_hours(window, now)` — gate the planner so a persona's files only appear inside the persona's window. Wrap-around is supported. Malformed windows fail open (a typo never silences the whole fleet). +- `sample_mtime(window, now, *, backdate_min_hours=0.5, backdate_max_days=14.0)` — return a backdated `datetime` whose hour-of-day falls inside the window. Drivers pass this to `touch -d` after every plant. The hour-snap is skipped when the candidate already lands in window; when it has to snap, the result is shifted back at least one day so it stays in the past. + +Net effect: a `~/TODO.md` planted during admin's 09:00–18:00 window will report mtimes inside that window, biased toward "edited recently" but never wall-clock-now. + +## Edit-in-place + +When the planner picks `action="edit"`, the orchestrator reads the previous body from the `synthetic_files` row, asks `realism.bodies.next_iteration` for a plausible mutation, writes it back with a fresh in-window mtime, and bumps `edit_count + 1`. Per content_class: + +- **TODO** — flip an unchecked box to `[x]`, append a new item, or both. +- **Note / draft / script** — append a new line / paragraph / comment. +- **Log_cron / log_daemon** — append a new syslog line (logs are append-only). + +Canary classes, `cache_tmp`, and `email` don't support edits — the planner filters them out at candidate-selection time. + +## LLM enrichment + +Optional. When `DECNET_REALISM_LLM` is set to a non-empty value (`ollama` / `fake` / etc.), the orchestrator builds an `LLMBackend` at startup and passes it through every tick. For user-class file bodies (`note` / `todo` / `draft` / `script`) the worker: + +1. Builds a class-conditioned prompt (`decnet/realism/prompts/filebody.py`). +2. Calls `await asyncio.wait_for(llm.generate(prompt), timeout=DECNET_REALISM_TIMEOUT)`. +3. Falls back to the deterministic template on `LLMTimeout`, error, empty output, or non-success. +4. Strips em-dashes (unless persona opted in) on the way out. + +System-class content (logs, /tmp caches) and canary classes never invoke the LLM — those are template-only by design. + +### Circuit breaker + +The per-call timeout protects one tick from one wedged Ollama; the breaker (`decnet/realism/llm/circuit.py`) protects the worker from a *sustained* problem. After 3 consecutive failures it flips open and short-circuits subsequent calls to the template fallback for 60 s, then half-opens to probe — success closes, failure re-opens with a fresh cooldown. State is process-local. Counters reset on any single success. + +## Canary cultivation + +Roughly **3 %** of file ticks land on a canary class. The cultivator (`decnet/canary/cultivator.py`): + +1. Maps the `canary_*` content_class to a generator name (`canary_aws_creds` → `aws_creds`, `canary_mysql_dump` → `mysql_dump`, …). +2. Mints a fresh `callback_token` (16 url-safe bytes). +3. Builds a `CanaryContext` from `$DECNET_CANARY_HTTP_BASE` and `$DECNET_CANARY_DNS_ZONE`. +4. Calls the generator for the bytes. +5. Persists a `canary_tokens` row before plant so the canary worker can attribute callbacks even on plant-time previews. +6. Returns a `CanaryArtifact` with the placement path resolved per-class (`~/.aws/credentials`, `~/.ssh/id_rsa`, `/var/backups/db_backup.sql`, …). + +Required env: at least `DECNET_CANARY_HTTP_BASE` for HTTP-callback generators, `DECNET_CANARY_DNS_ZONE` for DNS-callback ones (`ssh_key`, `mysql_dump`). Without them the cultivator raises and the orchestrator falls through to a non-canary plan — the tick isn't wasted. + +Stealth: the cultivator never adds the `DECNET` literal to artifact bytes. The underlying generators are already stealth-clean. A test asserts the contract holds (`tests/canary/test_cultivator.py::test_cultivate_artifact_does_not_leak_decnet_string`). + +### Volume and rate + +Canary tokens are real: each carries a real DNS subdomain, a real HTTP slug, a real `canary_tokens` row, and (when tripped) a real alert. The 3 % gate is conservative on purpose — flooding the fleet makes the dashboard noisy and explodes the alert surface. If you want more, edit `_CANARY_PROBABILITY` in `decnet/realism/planner.py`; if you want fewer, do the inverse. There is no per-decky daily cap today (planner-level), but the per-`(decky_uuid, path)` UNIQUE on `synthetic_files` provides natural deduplication. + +## Storage + +Two tables back this: + +- `synthetic_files` — per-`(decky_uuid, path)` row. Carries `persona`, `content_class`, `created_at`, `last_modified`, `edit_count`, `content_hash`, `last_body` (capped at 64 KB). Schema in `decnet/web/db/models/realism.py`. +- `canary_tokens` — existing canary-subsystem table; cultivator writes one row per canary plant. + +Two tables already in production receive the orchestrator's per-tick events: + +- `orchestrator_events` — `kind ∈ {"traffic", "file"}`. Includes `EditAction` rows under `kind="file"`, `action="file:edit"`. +- `orchestrator_emails` — `EmailAction` rows. + +## Configuration + +| Env var | Default | Effect | +|---|---|---| +| `DECNET_REALISM_LLM` | unset | Backend selector (`ollama` / `fake` / `off`). Unset / `off` / `none` / `0` / `false` / `disabled` disables enrichment; any other value enables. | +| `DECNET_REALISM_MODEL` | `llama3.1` | Ollama model name. | +| `DECNET_REALISM_TIMEOUT` | `60` | Per-call wall-clock cap (seconds). | +| `DECNET_REALISM_PERSONAS` | `/etc/decnet/email_personas.json` | Global pool path override. | +| `DECNET_CANARY_HTTP_BASE` | unset | HTTP callback base (`https://canary.example.test`). | +| `DECNET_CANARY_DNS_ZONE` | unset | DNS zone (`canary.example.test`). | + +Per-host overrides go in the orchestrator unit's `EnvironmentFile` (`{install_dir}/.env.local`), see [Systemd-Setup](Systemd-Setup). + +## CLI surface + +- `decnet orchestrate [--llm/--no-llm]` — the long-running worker. See [CLI Reference § decnet orchestrate](CLI-Reference#decnet-orchestrate). +- `decnet realism import-personas ` — validate and install the global persona pool. See [CLI Reference § decnet realism import-personas](CLI-Reference#decnet-realism-import-personas). + +## Dashboard + +The dashboard's **Persona Generation** page edits both pools (per-topology and global). A synthetic-files browser ("files this decky has grown") and an LLM-status panel are open follow-ups; the data is already persisted, just not yet rendered. + +## Migration history + +The realism library was extracted from the original `decnet/orchestrator/emailgen/` worker in eight stages. Stage notes live in commit messages on `dev`; the highlights: + +- Stage 2 — `emailgen/personas`, `emailgen/prompt`, `emailgen/global_pool`, `emailgen/llm/` moved into `decnet/realism/`. Env-var rename `DECNET_EMAILGEN_*` → `DECNET_REALISM_*` (clean break, pre-v1). +- Stage 4 — `ActivityDriver` ABC + `get_driver_for(action)` factory; `SSHDriver.plant_file` streams base64 via stdin (ARG_MAX-safe), honours `mtime`. +- Stage 5 — service collapse: `decnet-emailgen.service` deleted, `decnet emailgen run` deleted, `EmailAction` joined `TrafficAction` / `FileAction` in the orchestrator's tick. API URL `/api/v1/emailgen/personas` → `/api/v1/realism/personas`. CLI `decnet emailgen import-personas` → `decnet realism import-personas`. + +For the full story, `git log --oneline | grep realism` on `dev`. + +## See also + +- [Module Reference — Workers § Orchestrator](Module-Reference-Workers#orchestrator--decnetorchestrator) +- [Service-Bus](Service-Bus#topics) — `orchestrator.{traffic,file,email}.{decky_id}` topics +- [CLI Reference](CLI-Reference#decnet-orchestrate) +- [Environment Variables](Environment-Variables#realism-content-engine) +- [Security and Stealth](Security-and-Stealth) — em-dash policy, no-DECNET-literal contract diff --git a/Roadmap-and-Known-Debt.md b/Roadmap-and-Known-Debt.md index 9c9368e..0d1c7b9 100644 --- a/Roadmap-and-Known-Debt.md +++ b/Roadmap-and-Known-Debt.md @@ -18,12 +18,17 @@ DECNET keeps its forward-looking and backward-looking planning docs inside the m ## Audits and Coverage -- `development/REALISM_AUDIT.md` — decoy realism audit notes. +- `development/REALISM_AUDIT.md` — decoy realism audit notes (pre-migration; see [[Realism]] for the current content-engine surface). - `development/COVERAGE.md` — test coverage state. - `development/EVENTS.md` — event pipeline and schema notes. Each of these files lives in the DECNET repo, not this wiki. Follow the links above from a working checkout. +## Recently closed + +- **Orchestrator file generation looked obviously fake** — the pre-realism `decnet orchestrate` shipped three hardcoded templates that produced epoch-suffixed filenames (`notes-1777315854.txt`) with identical bodies. Closed by the realism migration (8 stages on `dev`, summarised in [[Realism#migration-history]]). Files now use plausible names, persona-conditioned bodies, edit-in-place over time, diurnal-sampled mtimes, optional LLM enrichment with a process-local circuit breaker, and 3 % canary cultivation on the same pacing. +- **Two workers for one shape** — `decnet-emailgen.service` was a sibling worker doing the same tick-driven work as `decnet orchestrate`. The migration collapsed it; one fewer systemd unit (down to 20). + --- See also: [[Home]] · [[Developer-Guide]] · [[Troubleshooting]] diff --git a/Service-Bus.md b/Service-Bus.md index 7bee913..fb1153a 100644 --- a/Service-Bus.md +++ b/Service-Bus.md @@ -148,6 +148,11 @@ Current topic families: | `topology.{id}.status` | Mutator | `{state, reason}` | | `decky.{id}.state` | _reserved_ | — | | `decky.{id}.traffic` | _reserved_ | — | +| `orchestrator.traffic.{decky_id}` | Orchestrator | `{kind: "traffic", protocol: "ssh", action, src_decky_uuid, dst_decky_uuid, success, payload, ts}` — synthetic inter-decky SSH traffic generated to keep the fleet from looking suspiciously static | +| `orchestrator.file.{decky_id}` | Orchestrator | `{kind: "file", protocol: "ssh", action, dst_decky_uuid, success, payload, ts}` — synthetic file create or edit (`action="file:create"` / `"file:edit"`) performed inside a decky via `docker exec`. Driven by the realism planner; rare ticks (~3%) carry callback-bearing canary content. | +| `orchestrator.email.{decky_id}` | Orchestrator | `{kind: "email", mail_decky_uuid, thread_id, message_id, in_reply_to, sender_email, recipient_email, subject, language, success, ts}` — one fake corporate email persona-driven by the realism content engine and dropped into a mail decky's spool. `decky_id` is the mail decky (the IMAP/POP3 host serving the mailbox). Producer changed from a separate `emailgen` worker to the unified orchestrator in the realism migration; topic shape is unchanged. | +| `system.orchestrator.health` | Orchestrator | standard worker heartbeat (covers traffic + file + email branches) | +| `system.orchestrator.control` | Orchestrator | admin-originated stop intents for the orchestrator loop | | `attacker.observed` | Correlator | first sighting; consumed by `decnet enrich` as a wake signal | | `attacker.scored` | Profiler | post-enrichment score update; also wakes `decnet enrich` | | `attacker.intel.enriched` | `decnet enrich` | `{attacker_ip, aggregate_verdict, providers}` after a threat-intel pass; webhook → SIEM | @@ -160,6 +165,10 @@ Current topic families: | `campaign.identity.assigned` | Campaign clusterer | `{campaign_uuid, identity_uuid}` — identity attached / re-attached to a campaign | | `campaign.merged` | Campaign clusterer | `{winner_uuid, loser_uuid, identity_uuids: [...]}` — two campaigns collapsed; subscribers re-key cached references to the winner | | `campaign.unmerged` | Campaign clusterer | `{resurrected_uuid, former_winner_uuid, identity_uuids: [...]}` — revocable-merge undo at the campaign layer | +| `canary.{token_id}.placed` | Planter (API + deploy hook) | `{token_id, decky_id, kind, instrumenter?, placement_path, placed_at}` — a canary artifact was successfully written into a decky's filesystem (or a passive token persisted) | +| `canary.{token_id}.triggered` | `decnet canary` worker | `{token_id, decky_id, src_ip, user_agent?, request_path?, dns_qname?, occurred_at, raw_headers?}` — attacker hit the HTTP slug or DNS subdomain; correlator + webhook fanout consume to attribute and forward | +| `canary.{token_id}.revoked` | API (`DELETE /tokens/{id}`) | `{token_id, decky_id, revoked_at}` — operator removed a token; subscribers may evict cached lookups by token id | +| `system.canary.health` | `decnet canary` worker | standard worker heartbeat | | `system.log` | _reserved_ | — | | `system.bus.health` | Bus worker heartbeat | `{ts, uptime_s}` | diff --git a/Systemd-Setup.md b/Systemd-Setup.md index 6077560..01acf4d 100644 --- a/Systemd-Setup.md +++ b/Systemd-Setup.md @@ -330,6 +330,62 @@ WantedBy=multi-user.target Required env: `DECNET_SYSTEM_LOGS` plus any per-service override files referenced by the mutator. +### `/etc/systemd/system/decnet-orchestrator.service` + +Single worker that injects synthetic life into the running fleet — +inter-decky SSH traffic, persona-driven file plants and edits, fake +corporate email drops, occasional canary cultivation. After the +realism migration this one unit covers what `decnet-orchestrator` +and `decnet-emailgen` did separately. The bundled template lives at +`deploy/decnet-orchestrator.service.j2` and includes +`SupplementaryGroups=docker` (the worker drives `docker exec` +against decky containers) plus the realism env block. + +```ini +[Unit] +Description=DECNET Orchestrator (synthetic life — traffic + file plants + email) +After=network-online.target decnet-bus.service decnet-api.service +Wants=decnet-bus.service decnet-api.service + +[Service] +Type=simple +User=decnet +Group=decnet +WorkingDirectory=/opt/DECNET +EnvironmentFile=-/opt/DECNET/.env.local +Environment=DECNET_SYSTEM_LOGS=/var/log/decnet/decnet.orchestrator.log +# LLM enrichment — opt-in. Leave DECNET_REALISM_LLM unset / empty to +# stay on deterministic templates. Any non-empty value enables. +Environment=DECNET_REALISM_LLM= +Environment=DECNET_REALISM_MODEL=llama3.1 +Environment=DECNET_REALISM_TIMEOUT=60 +Environment=DECNET_REALISM_PERSONAS=/etc/decnet/email_personas.json +ExecStart=/opt/DECNET/.venv/bin/decnet orchestrate + +# docker-exec drives decky writes; needs the docker group. +SupplementaryGroups=docker + +NoNewPrivileges=yes +ProtectSystem=full +ProtectHome=read-only + +Restart=on-failure +RestartSec=5 + +[Install] +WantedBy=multi-user.target +``` + +The realism content engine is documented separately in +[[Realism]]. Set `DECNET_CANARY_HTTP_BASE` and +`DECNET_CANARY_DNS_ZONE` in `.env.local` to enable callback-bearing +canary cultivation; without them the cultivator falls back to +non-canary plans for generators that need a callback host. + +There is no separate `decnet-emailgen.service` after the migration. +If your deploy still references one, drop the entry from +`decnet.target` and remove the unit file. + ## 7. Enable and start ```bash @@ -340,7 +396,8 @@ sudo systemctl enable --now \ decnet-profiler.service \ decnet-sniffer.service \ decnet-collect.service \ - decnet-mutate.service + decnet-mutate.service \ + decnet-orchestrator.service ``` Check status and tail logs: diff --git a/_Sidebar.md b/_Sidebar.md index 1facde3..a429a0b 100644 --- a/_Sidebar.md +++ b/_Sidebar.md @@ -27,6 +27,7 @@ - [Systemd-Setup](Systemd-Setup) - [Logging-and-Syslog](Logging-and-Syslog) - [Service-Bus](Service-Bus) +- [Realism](Realism) - [Web-Dashboard](Web-Dashboard) - [REST-API-Reference](REST-API-Reference) - [Mutation-and-Randomization](Mutation-and-Randomization)