1
Realism
anti edited this page 2026-04-27 17:23:41 -04:00
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Realism

The realism content engine is what makes DECNET deckies look lived-in. Without it, a deployed honeypot has a frozen filesystem, mailboxes that never grow, and timestamps clustered at deploy time. Attackers notice. The realism library — decnet/realism/ — drives the orchestrator's per-tick file plants and email drops so each decky grows files at plausible hours, with persona-conditioned names and bodies, occasionally edited in place, and very rarely seeded with callback-bearing canaries.

This is the operator-facing guide. For the underlying module surface see Module Reference — Workers § Orchestrator.

Why this exists

Pre-realism, the orchestrator's file plants looked like this on a deployed decky:

$ ls /home/admin/
notes-1777254307.txt  notes-1777260507.txt  notes-1777266693.txt  notes-1777274923.txt
$ cat notes-1777254307.txt
todo: rotate keys; check on backup task

Two tells:

  • Filenames are unix epochs. No real user names a file notes-1777315854.txt. They write notes.txt, TODO.md, keys.txt.
  • Identical bodies. Every notes-*.txt had the same one-line content because the generator was three hardcoded templates.

The realism engine fixes both — and adds edit-in-place, diurnal pacing, optional LLM enrichment, and canary cultivation on the same pacing.

Architecture in one paragraph

The orchestrator ticks every 60 s and rolls a weighted action kind: 45 % SSH traffic, 45 % file plant or edit, 10 % email. The file branch asks the realism planner for a Plan (decky, persona, content_class, action, mtime, body hint). The planner enforces a diurnal gate (only personas in their active_hours window are considered), weights content classes (user > system > canary), and decides create / edit / leave-alone. The plan flows through the SSH driver, which writes the bytes via base64-on-stdin docker exec with a backdated mtime via touch -d. After a successful plant or edit the worker persists or patches a synthetic_files row so the next tick can edit it again. When LLM enrichment is enabled, user-class bodies get one Ollama round-trip each; on timeout / error / breaker-trip the deterministic template is the fallback.

Content classes

Every planted artifact maps to exactly one ContentClass member (defined in decnet/realism/taxonomy.py).

Class Category LLM-eligible Examples
note user yes ~/notes.txt, ~/scratch.md, ~/keys.txt
todo user yes ~/TODO.md, ~/todo.txt, ~/things.md
draft user yes ~/Q3-budget-DRAFT.md, ~/proposal.md
script user yes ~/backup.sh, ~/cleanup.sh, ~/fix.py
log_cron system no /var/log/cron.log, /var/log/cron.log.1, /var/log/cron.log.2.gz
log_daemon system no /var/log/daemon.log, /var/log/syslog, /var/log/auth.log
cache_tmp system no /tmp/.cache-XXXXXX (mkstemp shape)
email email yes mail-decky maildir contents
canary_aws_creds canary no ~/.aws/credentials (passive)
canary_env_file canary no ~/app/.env (HTTP callback)
canary_git_config canary no ~/.git/config (HTTP callback)
canary_ssh_key canary no ~/.ssh/id_rsa (DNS callback in comment)
canary_honeydoc canary no ~/Documents/notes.html (HTTP callback)
canary_honeydoc_docx canary no ~/Documents/Q3-Operations-Review.docx (DOCX with remote 1×1 image)
canary_honeydoc_pdf canary no same as docx, PDF flavour
canary_mysql_dump canary no /var/backups/db_backup.sql (replica-handshake DNS phone-home)

System-class content is deliberately template-only. Real cron logs are formulaic — an LLM-authored cron log is more suspicious than a templated one. Canary classes are also template-only because their generators are deterministic by design (re-seeding from the same callback token must produce the same bytes for planter idempotency).

Personas

Personas are fictional employees the realism engine writes as. Each persona carries:

  • name, email, role — basic identity.
  • toneformal / direct / casual / technical / custom — drives the LLM voice.
  • mannerisms — short list of stylistic ticks; 12 are randomly picked into each prompt.
  • language — ISO 639-1; the LLM is instructed not to code-switch.
  • active_hours"HH:MM-HH:MM", supports wrap-around ("22:00-06:00"). The planner skips a persona outside its window.
  • signature — optional verbatim block for emails.
  • uses_llms_heavily — opt-out for the em-dash suppression (see below).

Two pools

  • Topology poolTopology.email_personas, edited per topology via the dashboard's Persona Generation page (/topologies/:id/personas). MazeNET-topology deckies use this.
  • Global pool — a JSON file on disk, edited via /realism/personas on the dashboard or decnet realism import-personas <file> on the CLI. Fleet (MACVLAN/IPVLAN) and SWARM-shard deckies use this. Path resolution: $DECNET_REALISM_PERSONAS/etc/decnet/email_personas.json~/.decnet/email_personas.json.

Files vary by user (admin vs ubuntu vs service), so a single decky can host files from multiple personas — the planner samples per tick, persists the picked persona on the synthetic_files row, and never binds one decky to a single fictional employee.

Em-dash suppression

Em-dashes () are a strong stylometric tell for LLM-authored prose. By default the prompt builder instructs the model to avoid them, and a belt-and-braces strip_em_dashes substitutes any that slip through. Personas with uses_llms_heavily=true opt out — they're meant to look like the kind of person who really does write that way.

Diurnal gating

Two helpers in decnet/realism/diurnal.py:

  • in_work_hours(window, now) — gate the planner so a persona's files only appear inside the persona's window. Wrap-around is supported. Malformed windows fail open (a typo never silences the whole fleet).
  • sample_mtime(window, now, *, backdate_min_hours=0.5, backdate_max_days=14.0) — return a backdated datetime whose hour-of-day falls inside the window. Drivers pass this to touch -d after every plant. The hour-snap is skipped when the candidate already lands in window; when it has to snap, the result is shifted back at least one day so it stays in the past.

Net effect: a ~/TODO.md planted during admin's 09:0018:00 window will report mtimes inside that window, biased toward "edited recently" but never wall-clock-now.

Edit-in-place

When the planner picks action="edit", the orchestrator reads the previous body from the synthetic_files row, asks realism.bodies.next_iteration for a plausible mutation, writes it back with a fresh in-window mtime, and bumps edit_count + 1. Per content_class:

  • TODO — flip an unchecked box to [x], append a new item, or both.
  • Note / draft / script — append a new line / paragraph / comment.
  • Log_cron / log_daemon — append a new syslog line (logs are append-only).

Canary classes, cache_tmp, and email don't support edits — the planner filters them out at candidate-selection time.

LLM enrichment

Optional. When DECNET_REALISM_LLM is set to a non-empty value (ollama / fake / etc.), the orchestrator builds an LLMBackend at startup and passes it through every tick. For user-class file bodies (note / todo / draft / script) the worker:

  1. Builds a class-conditioned prompt (decnet/realism/prompts/filebody.py).
  2. Calls await asyncio.wait_for(llm.generate(prompt), timeout=DECNET_REALISM_TIMEOUT).
  3. Falls back to the deterministic template on LLMTimeout, error, empty output, or non-success.
  4. Strips em-dashes (unless persona opted in) on the way out.

System-class content (logs, /tmp caches) and canary classes never invoke the LLM — those are template-only by design.

Circuit breaker

The per-call timeout protects one tick from one wedged Ollama; the breaker (decnet/realism/llm/circuit.py) protects the worker from a sustained problem. After 3 consecutive failures it flips open and short-circuits subsequent calls to the template fallback for 60 s, then half-opens to probe — success closes, failure re-opens with a fresh cooldown. State is process-local. Counters reset on any single success.

Canary cultivation

Roughly 3 % of file ticks land on a canary class. The cultivator (decnet/canary/cultivator.py):

  1. Maps the canary_* content_class to a generator name (canary_aws_credsaws_creds, canary_mysql_dumpmysql_dump, …).
  2. Mints a fresh callback_token (16 url-safe bytes).
  3. Builds a CanaryContext from $DECNET_CANARY_HTTP_BASE and $DECNET_CANARY_DNS_ZONE.
  4. Calls the generator for the bytes.
  5. Persists a canary_tokens row before plant so the canary worker can attribute callbacks even on plant-time previews.
  6. Returns a CanaryArtifact with the placement path resolved per-class (~/.aws/credentials, ~/.ssh/id_rsa, /var/backups/db_backup.sql, …).

Required env: at least DECNET_CANARY_HTTP_BASE for HTTP-callback generators, DECNET_CANARY_DNS_ZONE for DNS-callback ones (ssh_key, mysql_dump). Without them the cultivator raises and the orchestrator falls through to a non-canary plan — the tick isn't wasted.

Stealth: the cultivator never adds the DECNET literal to artifact bytes. The underlying generators are already stealth-clean. A test asserts the contract holds (tests/canary/test_cultivator.py::test_cultivate_artifact_does_not_leak_decnet_string).

Volume and rate

Canary tokens are real: each carries a real DNS subdomain, a real HTTP slug, a real canary_tokens row, and (when tripped) a real alert. The 3 % gate is conservative on purpose — flooding the fleet makes the dashboard noisy and explodes the alert surface. If you want more, edit _CANARY_PROBABILITY in decnet/realism/planner.py; if you want fewer, do the inverse. There is no per-decky daily cap today (planner-level), but the per-(decky_uuid, path) UNIQUE on synthetic_files provides natural deduplication.

Storage

Two tables back this:

  • synthetic_files — per-(decky_uuid, path) row. Carries persona, content_class, created_at, last_modified, edit_count, content_hash, last_body (capped at 64 KB). Schema in decnet/web/db/models/realism.py.
  • canary_tokens — existing canary-subsystem table; cultivator writes one row per canary plant.

Two tables already in production receive the orchestrator's per-tick events:

  • orchestrator_eventskind ∈ {"traffic", "file"}. Includes EditAction rows under kind="file", action="file:edit".
  • orchestrator_emailsEmailAction rows.

Configuration

Env var Default Effect
DECNET_REALISM_LLM unset Backend selector (ollama / fake / off). Unset / off / none / 0 / false / disabled disables enrichment; any other value enables.
DECNET_REALISM_MODEL llama3.1 Ollama model name.
DECNET_REALISM_TIMEOUT 60 Per-call wall-clock cap (seconds).
DECNET_REALISM_PERSONAS /etc/decnet/email_personas.json Global pool path override.
DECNET_CANARY_HTTP_BASE unset HTTP callback base (https://canary.example.test).
DECNET_CANARY_DNS_ZONE unset DNS zone (canary.example.test).

Per-host overrides go in the orchestrator unit's EnvironmentFile ({install_dir}/.env.local), see Systemd-Setup.

CLI surface

Dashboard

The dashboard's Persona Generation page edits both pools (per-topology and global). A synthetic-files browser ("files this decky has grown") and an LLM-status panel are open follow-ups; the data is already persisted, just not yet rendered.

Migration history

The realism library was extracted from the original decnet/orchestrator/emailgen/ worker in eight stages. Stage notes live in commit messages on dev; the highlights:

  • Stage 2 — emailgen/personas, emailgen/prompt, emailgen/global_pool, emailgen/llm/ moved into decnet/realism/. Env-var rename DECNET_EMAILGEN_*DECNET_REALISM_* (clean break, pre-v1).
  • Stage 4 — ActivityDriver ABC + get_driver_for(action) factory; SSHDriver.plant_file streams base64 via stdin (ARG_MAX-safe), honours mtime.
  • Stage 5 — service collapse: decnet-emailgen.service deleted, decnet emailgen run deleted, EmailAction joined TrafficAction / FileAction in the orchestrator's tick. API URL /api/v1/emailgen/personas/api/v1/realism/personas. CLI decnet emailgen import-personasdecnet realism import-personas.

For the full story, git log --oneline | grep realism on dev.

See also