feat(realism): canary cultivator on the realism contract

Stage 7 — final stage of the realism migration. Canary plants are
now scheduled by the same realism planner that handles inert content,
keeping the orchestrator as the single decision point and avoiding
duplicate diurnal / persona / rate-limit logic in the canary
subsystem.

New surface:

- decnet/canary/cultivator.py: cultivate(plan, repo) builds a
  CanaryContext, calls the right generator (canary_aws_creds ->
  aws_creds, canary_mysql_dump -> mysql_dump, …), persists the
  canary_tokens row before plant so the canary worker can attribute
  callbacks even on plant-time previews. Resolves canary placements
  to credible operator paths (~/.aws/credentials, ~/.ssh/id_rsa,
  /var/backups/db_backup.sql).
- realism/planner.py adds 8 canary content_classes uniformly weighted
  inside a 3% probability gate. Hard-capped: each tick at most one
  canary; create branch falls through to inert otherwise.
- scheduler.pick_file dispatches canary content_class to the
  cultivator; FileAction grows an optional content_bytes field so
  binary canary artifacts (DOCX/PDF/honeydoc) survive the wire
  intact instead of being utf-8 round-tripped.
- SSHDriver._run_file uses content_bytes when set, falls back to
  encoding the str content otherwise.

Stealth (per feedback_stealth.md): cultivator does not introduce
any DECNET literal; the underlying generators are already
stealth-clean and the test suite asserts the contract holds.

Tests cover round-tripping every canary class through the cultivator,
verifying placement-path conventions, persona-login normalisation
("John Smith" -> /home/johnsmith/.aws/credentials), and the
no-DECNET-leak invariant.
This commit is contained in:
2026-04-27 16:47:59 -04:00
parent 4e436da569
commit a07fb3fe08
6 changed files with 392 additions and 10 deletions

View File

@@ -45,6 +45,21 @@ _SYSTEM_CLASS_WEIGHTS: tuple[tuple[ContentClass, int], ...] = (
(ContentClass.LOG_DAEMON, 8),
(ContentClass.CACHE_TMP, 5),
)
# Canary classes are picked rarely. Each plant materialises a real
# CanaryToken row + DNS slug + HTTP URL — flooding the fleet with
# canaries makes the dashboard noisy and the per-decky alert surface
# explode. ~3% of file picks land here.
_CANARY_CLASS_WEIGHTS: tuple[tuple[ContentClass, int], ...] = (
(ContentClass.CANARY_AWS_CREDS, 1),
(ContentClass.CANARY_ENV_FILE, 1),
(ContentClass.CANARY_GIT_CONFIG, 1),
(ContentClass.CANARY_SSH_KEY, 1),
(ContentClass.CANARY_HONEYDOC, 1),
(ContentClass.CANARY_HONEYDOC_DOCX, 1),
(ContentClass.CANARY_HONEYDOC_PDF, 1),
(ContentClass.CANARY_MYSQL_DUMP, 1),
)
_CANARY_PROBABILITY = 0.03
def _weighted_pick(
@@ -117,6 +132,33 @@ def pick(
decky, persona = rng.choice(eligible)
# Canary first — they're rare (~3% of file picks), uniformly
# weighted across generators. Falling here means the orchestrator
# plants a callback-bearing artifact this tick instead of an
# inert one.
if rng.random() < _CANARY_PROBABILITY:
content_class = _weighted_pick(_CANARY_CLASS_WEIGHTS, rng)
# Canary placement is the cultivator's job — plan.target_path
# is advisory; a "" lets the cultivator override entirely.
target_path = ""
body_hint = None
mtime = sample_mtime(persona.active_hours, now, rand=rng)
return Plan(
decky_uuid=decky["uuid"],
decky_name=decky["name"],
persona=persona.name,
content_class=content_class,
action="create",
target_path=target_path,
mtime=mtime,
body_hint=body_hint,
notes=(
f"persona={persona.name}",
f"class={content_class.value}",
"kind=canary",
),
)
# User vs system content — biased toward user (realism wins are
# bigger there).
if rng.random() < 0.7: