Three independent issues conspired to make stress tests record 0 requests:
1. Every virtual user did /auth/login in on_start. With 1000 users in a
spike window, bcrypt-bound logins never finished and on_start failed
for all users — aggregated requests stayed at 0. Pre-fetch a single
admin token in the fixture (cached per-host) and pass it via
DECNET_STRESS_TOKEN so locust users skip the login storm.
2. Locust exits non-zero on any request failure by default, causing
run_locust to throw away an otherwise valid stats CSV. Pass
--exit-code-on-error 0 so per-test assertions are the only fail gate.
3. test_stress_sustained ran two locust subprocesses against the same
uvicorn. Phase 1's keep-alive connections wedged phase 2 into 0
recorded requests ~2/3 of the time. Refactored stress_server into a
start_stress_server() context manager and gave each phase its own
uvicorn.
Stable 3/3 on full suite, 3/3 on test_stress_sustained alone.
Brings the federation-gossip columns on AttackerIdentity to life —
ja3_hashes, hassh_hashes, and the new tls_cert_sha256 — by projecting
the union of every member observation's fingerprints JSON onto the
identity at clusterer create / link / merge time.
- decnet/profiler/identity_rollup.py: pure extract_fp_summaries()
reads the production bounty shape (payload.fingerprint_type +
payload.{ja3,hash,cert_sha256}) and returns deduped+sorted JSON
list[str] per family, or None when a family has no signal so the
column stays NULL instead of '[]'.
- BaseRepository.update_identity_fingerprints + SQLModel impl: one
idempotent write that overwrites the three summary columns and
bumps updated_at.
- ConnectedComponentsClusterer: after every per-component
reconciliation (fresh-create OR existing-merge+link), recomputes
and writes the rollup for the target identity. Wrapped in a
best-effort helper so a write failure logs but never breaks the
tick.
- Tests: extract_fp_summaries unit (dedup, sort determinism,
unknown types ignored, malformed JSON, nested-stringified
payloads, non-string values); end-to-end clusterer ticks
populate the columns on create + on later observation links;
no-fingerprint clusters keep the columns NULL.
- FpCertificate renders the new cert_sha256 field (truncated, with
full hash on hover) and a FROM line carrying the prober-side
target_ip/port so the source is visible.
- tls_certificate payloads split on target_ip presence: prober certs
land under ACTIVE PROBES, sniffer certs under PASSIVE FINGERPRINTS.
Two synthetic fpType keys (tls_certificate_active /
tls_certificate_passive) drive the bucketing without disturbing
the on-the-wire fingerprint_type.
JARM probes are crafted ClientHellos with weird ciphers — they never
complete a real handshake, so the peer cert isn't reachable from
those sockets. After a non-empty JARM hash proves the port speaks
TLS, do a separate ssl.wrap_socket() against the same (ip, port) to
fetch and parse the leaf cert.
- decnet/prober/tlscert.py: fetch + parse via cryptography lib;
swallows all connect/handshake/parse failures (returns None).
- decnet/prober/worker.py::_capture_tls_cert: emits a tls_certificate
event with subject_cn / issuer / SANs / validity / SHA-256 +
publishes on the bus. Wired from _jarm_phase only when JARM
succeeds, so non-TLS ports never trigger a second connect.
- Tests cover happy path, cert-fetch failure, defense-in-depth crash,
empty-JARM skip, publish_fn, and parser edge cases (garbage DER,
empty bytes, missing SAN extension, non-self-signed).
Adds storage for TLS certificate details collected from attacker-run
servers by the active prober (sibling to the existing JARM probe).
- AttackerIdentity.tls_cert_sha256 / Campaign.tls_cert_sha256:
JSON list[str] columns mirroring ja3_hashes / hassh_hashes for
federation gossip.
- ingester clause 9b: emits a 'tls_certificate' fingerprint bounty
when a prober event carries subject_cn (disjoint from the existing
sniffer-gated clause).
- Prober-side capture (ssl.wrap_socket follow-up after JARM) and
profiler rollup land in sibling commits.
The check expects 405 for any HTTP method not declared on a path.
DECNET's topology router has a static `/topologies/services` (GET only)
sibling to a parameterized `/topologies/{topology_id}` (DELETE), so a
DELETE on the static path falls through to the parameterized route and
hits auth, which returns 401 — by design. Leaking 405-vs-401 would let
unauthenticated callers enumerate valid topology UUIDs.
The same shape applies to other static/dynamic sibling pairs across
the API. The check is fundamentally incompatible with that routing
strategy; document the omission inline.
Schemathesis fires up to 3000 examples per endpoint. POST /auth/login
caps at 10/5min per IP, so the second example onward returns 429 and
the positive_data_acceptance check flags it as RejectedPositiveData
(its allowed-status list is hardcoded in schemathesis to
2xx/401/403/404/409/5xx, so OpenAPI tweaks can't fix it).
DECNET_LIMITER_ENABLED=false exists for exactly this case (see
limiter.py docstring on stress/load testing).
Reverts the custom_openapi shim from 5d88346 / 9b1168c — the endpoint
already declares 429 in its responses= map (api_login.py:38), and the
shim turned out to address a problem that wasn't there. Drop the
companion test along with it.
Previous commit advertised 429 on every operation. Only routes
decorated with @limiter.limit can actually return slowapi's 429 —
currently just POST /api/v1/auth/login. Documenting it elsewhere is
dishonest and would mislead clients into expecting a response the
server cannot produce.
Walk slowapi's _route_limits / _dynamic_route_limits registries to
identify decorated endpoints, match them to FastAPI routes by
{module}.{name}, and only inject 429 on those.
Existing per-route 429 declarations (e.g. SSE connection-cap on
events streams via sse_limits) are untouched.
SlowAPI middleware can short-circuit any request with 429 once a
per-route or per-IP rate limit fires (e.g. POST /api/v1/auth/login is
capped at 10/5min). The OpenAPI spec did not declare 429 on any
operation, so schemathesis flagged legitimate rate-limit responses as
RejectedPositiveData / status-code-nonconformance failures.
Override app.openapi to inject a generic 429 response object on every
HTTP operation in the generated schema. Add a contract test that fails
if any operation drops the 429 advertisement.
- swarm/test_swarm_api, swarm/test_heartbeat: replace deprecated
asyncio.get_event_loop().run_until_complete() with asyncio.run();
the former raises in 3.11 once another test has set+closed a loop on
the main thread.
- prober/test_prober_bus, prober/test_prober_worker: extend tcp_fingerprint
mocks with tos/dscp/ecn/server_isn so the worker doesn't KeyError into
the prober_error branch.
- services/test_service_isolation: collector now retries on event-stream
errors instead of exiting; assert it stays running and cancel cleanly.
- live/test_imap_live, live/test_pop3_live: log format emits
outcome="failure", not "failed".
- live/test_service_isolation_live: is_service_container accepts label
OR state-name; rewrite the empty-state test against a synthetic
unlabeled container instead of the host's real fleet.
OpenSSH's native syslog ("Failed password", "Connection from",
"Connection closed by …") and the pam_unix lines emitted from sshd's
PAM stack add no signal beyond what auth-helper already captures as
structured login_attempt events. They cluttered the dashboard and
arrived without an SD wrapper, forcing prose-IP heuristics in the
collector.
Add a `:programname, isequal, "sshd" stop` rule above the forwarding
actions in /etc/rsyslog.d/50-journal-forward.conf. pam_unix lines from
sshd inherit programname=sshd so the same rule covers both. sudo /
login / su pam_unix lines keep flowing (different programname), so
post-login privilege escalation telemetry is preserved.
Native sshd and pam_unix lines route through rsyslog without the
relay@55555 SD wrapper and without key=value pairs, so attacker_ip
fell through to "Unknown". Add a prose-IP fallback to both parsers:
anchored patterns (from/rhost/client/src) win first so we never pick
the local listener in "Connection from X port Y on Z port 22", with
a bare-IPv4 scan as the last resort.
The prober writes events with hostname=decnet-prober and target_ip=
<the attacker being fingerprinted>. The parser pulls target_ip into
attacker_ip (it's one of _IP_FIELDS), which is correct for indexing
fingerprints under the attacker — but it had a side effect: every
fingerprinted attacker had two distinct deckies on file (the real
decoy they touched + decnet-prober) and the correlation engine's
traversals() classified that as lateral movement. Live dashboard
showed bogus "dmz-gateway -> decnet-prober" paths and TRAVERSAL
badges on attackers who'd done nothing but knock on the front door.
The prober is internal infrastructure, not a hop. Filter the
"decnet-" namespace out of distinct-decky counts and hop paths in
the engine. Fingerprints stay attached to the attacker profile via
the existing per-IP event index — just no longer as traversal.
Hit live on first VPS deploy: a window between the initial
client.containers.list() snapshot and the client.events() start-event
stream let topology service containers slip through, requiring an
operator restart for them to be picked up.
Two fixes:
* `_watch_events` now wraps the events() call in a retry loop with
exponential backoff (1s -> 30s cap). A docker.errors.APIError, daemon
reload, or SDK stream-decode hiccup used to make the executor task
return cleanly, leaving the collector "running" with no event
subscription. Future container starts were silently dropped until
the unit was restarted.
* New `_reconcile_loop` async task ticks every
DECNET_COLLECTOR_RECONCILE_S (default 30s), re-scans
client.containers.list(), and calls _spawn for any service container
not already in `active`. Belt to the event watcher's suspenders:
even if a start event is dropped during a reconnect window, the
reconciler picks it up within one cycle. Also prunes finished
futures from `active` so the dict's bounded by current container
count rather than agent lifetime churn.
The module docstring teaches inline comments — `mode = master # or
"agent"` is the canonical example for the [decnet] section. Python's
configparser ignores those by default unless inline_comment_prefixes
is set explicitly, so the comment became part of the value and
downstream validators rejected it ("mode must be 'agent' or 'master',
got 'master # or \"agent\"'").
Hit live on first VPS deploy: every CLI invocation crashed at import
time with a stack trace that didn't make it obvious the docstring's
example was the trigger. Now the parser does what the docs promise.
Decky service containers join their base via `network_mode:
container:<base>` and Docker binds that share at service start time. If
`docker compose up` recreates a base (e.g. ports: changes after a
forwards_l3 toggle) but decides services are unchanged, services keep
a stale FD into the destroyed namespace and end up with only `lo` — so
external traffic hits a closed port on the live base and gets RST.
Hit live on the first VPS deploy: external SSH to the dmz-gateway was
refused while sshd was listening, because base and service netns
inodes had drifted apart. `--always-recreate-deps` makes compose
rebuild every dependent whenever its base is recreated, removing the
race entirely.
The dashboard's /api/* proxy hardcoded 127.0.0.1 as the target host.
That works when the API binds to a wildcard or to loopback, but
breaks the moment an operator binds the API to a specific address —
e.g. a Tailscale IP for tailnet-only deploys: the API stops listening
on loopback entirely and the proxy gets ECONNREFUSED on every request.
The web command now reads DECNET_API_HOST and falls back to loopback
only when the API is on a wildcard (0.0.0.0 / :: / unset). A new
--api-host flag overrides at the CLI level.
Without rotation, the syslog listener and per-host collector grow
/var/log/decnet/ without bound — a noisy attacker (or an active
probe storm) fills the disk in hours on a small VPS. New
deploy/logrotate.d/decnet caps at 7 daily rotations or 100 MiB,
whichever comes first, and uses copytruncate because the ingester
and forwarder hold the files open via Python and won't reopen on
a rename rotation.
Wire install / remove into `decnet init` and `decnet init --deinit`
alongside the existing tmpfiles.d / polkit handling.
Refuse to start decnet.web.api when DECNET_MODE=agent (unless the
operator explicitly opts into dual-role with DECNET_DISALLOW_MASTER=
false). The Typer CLI already hides master-only commands on agents,
but a misconfigured systemd unit or a direct uvicorn invocation
would bypass that — now the lifespan itself refuses, before any
worker, DB or bus comes up.
Resolve DECNET_JWT_SECRET eagerly at startup so a missing or known-
bad value fails at boot rather than on the first auth-gated request.
The lazy-load shape stays useful for non-master CLIs.
Add validate_public_binding() called from the master API lifespan: when
DECNET_API_HOST is non-loopback, refuse to start if DECNET_CORS_ORIGINS
still contains a loopback origin (catches the "operator flipped to
0.0.0.0 to make it work and forgot to update CORS" footgun) or if
DECNET_CANARY_HTTP_BASE is plaintext http:// to a non-loopback host.
Log CRITICAL when DECNET_LIMITER_ENABLED=false on a public binding.
The validator no-ops under pytest so unrelated suites don't trip on it.
Add DECNET_VERIFY_HOSTNAME env knob; AgentClient and UpdaterClient
consult it when verify_hostname is None, giving production deploys
TLS hostname verification on top of the existing CA + fingerprint pin.
Default off so dev enrollments with mismatched SANs keep working.
Reject symlinks, hardlinks, device nodes and FIFOs in update tarballs;
validate each member's resolved path stays under dest after symlink
resolution; cap uncompressed size at 256 MiB to bound gzip-bomb damage;
strip setuid/setgid bits from extracted modes.
Add an optional sha256 form field to /update and /update-self; the
master client computes and sends it on every push, the executor
refuses to extract on mismatch. mTLS already authenticates the
master, so this is defence-in-depth against in-transit corruption
and gives operators a way to pin "exactly these bytes" for vetted
releases.
Both pages now layer on DeckyFleet.css + PersonaGeneration.css and use
the project's house vocabulary — fleet-root shell, page-header with
title-group + actions, btn / btn.violet / btn.ghost, info-banner with
the violet left rule, and the dim/matrix/alert text accents.
RealismConfig: inputs are flush-styled weight-input fields with a
violet focus ring; section heads carry a TOTAL badge; canary rows get
the project's amber accent; canary probability lives in a panel-bordered
slider row.
SyntheticFiles: the inline-styled table is now a styled .files-table
with the standard hover affordance, the filter-row uses tweak-group
label+select pairs, the drawer carries .drawer-eyebrow / .drawer-title
/ .meta-grid in the same style as the canary token drawer, and pager
buttons share the .btn.ghost.small treatment.
No behavioural change.
Single source of truth in decnet_web/src/realism/labels.ts: maps each
ContentClass enum value to a friendly display name ("Note",
"Cron Log", "Canary · AWS Credentials", …). Used by RealismConfig
(weight tables + class filter dropdown) and SyntheticFiles (table row
+ drawer detail).
Canary classes get a subtle amber accent so the dashboard's read of
"this row is callback-bearing" doesn't depend on prefix-spotting in
mono text. Raw enum value still appears in dim mono next to the label
so an operator copy/pasting from logs or grepping the codebase still
finds it.
No backend change: the wire shape is still the snake_case enum; the
beautification is render-time only.
New /realism-config page sits next to Persona Generation and
Synthetic Files under the Automation nav. Editable weight tables for
user / system / canary content classes (with live percent share),
plus a slider for canary_probability.
Wires GET/PUT /api/v1/realism/config — viewer can read; admin
required to save. Validation errors from the API are surfaced inline
rather than swallowed; the SAVE button refreshes from the server's
canonical snapshot so the operator sees exactly what landed (matters
because cross-list entries are silently dropped server-side).
New realism_config table (uuid PK + unique key) + two repo methods
(get/set) backs an admin-only GET/PUT /api/v1/realism/config surface.
The planner now exposes apply_payload(payload) / current_payload() /
reset_to_defaults() and reads its weights through mutable module
globals; pick() resolves the live values each call. Validation
catches negative weights, zero totals, out-of-range canary_probability,
unknown content_class names, and silently drops cross-list entries
(canary class on the user list, etc).
The orchestrator worker calls _refresh_realism_config(repo) on
startup and every 5 ticks (~5min at 60s interval). Operator changes
land within one refresh window with no bus signal — the simpler path
for a knob whose latency tolerance is minutes.
The (decky_uuid VARCHAR(64), path VARCHAR(1024)) UNIQUE constraint
generated a 4352-byte composite key under utf8mb4 (4 bytes/char),
busting MySQL's 3072-byte cap and crashing decnet api on init with:
Specified key was too long; max key length is 3072 bytes
Tighten path to VARCHAR(512) — (64+512)*4 = 2304 bytes, well under
the cap. Real realism + canary placement paths are short
(/home/<persona>/Documents/<file>, ~70 chars); 512 keeps headroom
without the index hassle. Pre-v1, no migration helper.
Adds a regression test pinning the (decky_uuid + path) byte budget so
a future widening fails loudly in CI rather than at MySQL deploy
time.
Surfaces realism subsystem state on the existing worker heartbeat
extra hook (system.orchestrator.health) — no new bus topic. Payload
carries {llm_enabled, llm_backend, llm_model, llm_breaker_state}, so
the dashboard's worker panel renders a live LLM badge with a colored
breaker-state dot:
closed (green) — LLM healthy
half_open (amber) — cooldown elapsed; next call is a probe
open (red) — short-circuiting to deterministic templates
Heartbeat is the canonical worker self-report channel; piggybacking on
extra(...) avoids a new topic family while keeping the snapshot
recomputed each beat (30s).
New /synthetic-files page sits next to Persona Generation and Canary
Tokens under the Automation nav group. Operators get a paginated
inventory of files realism has grown across the fleet (decky, path,
persona, content_class, last_modified, edit_count, hash) with filters
on decky / persona / content_class.
Decky filter is a dropdown sourced from /deckies — never free text.
Row click opens a drawer with the body preview; the drawer surfaces a
TRUNCATED chip when the stored body is at the 64KB cap.
Adds GET /api/v1/realism/synthetic-files (paginated list, filters by
decky_uuid, persona, content_class) and
GET /api/v1/realism/synthetic-files/{uuid} (single row with last_body
and a truncated:bool flag set when the stored body is at the 64KB cap).
Repo gains count_synthetic_files() and get_synthetic_file(uuid). The
list view drops last_body to keep the wire payload bounded; the detail
endpoint is the only path that returns it. Read-only — orchestrator
remains the sole writer.
FileAction and EditAction both write kind="file" — the discriminator
is action="file:create" vs "file:edit". The dashboard timeline used
to render both identically; now an EDIT sub-chip surfaces edits without
widening the kind enum (which doubles as the bus topic family).
No schema or API change. Polish only.
decnet/canary/cultivator wrote kind="http" for every cultivated
token, even DNS-trip ones (ssh_key, mysql_dump) and passive bait
(aws_creds). The canary worker uses kind to route attacker callbacks
to the right token; a misaligned kind means a real DNS resolution of
ssh_key or mysql_dump never attributes to the planted slug.
Add _GENERATOR_TO_KIND aligned with CanaryKind in models/canary.py
and look it up at create_canary_token time.
decnet/realism/naming._home and decnet/canary/cultivator._persona_login
both normalised "John Smith"→"johnsmith" with identical logic. Lift
to decnet.realism.personas.login_for(persona) and have both consumers
import it. Drift between the two would have left canary placement and
realism path naming using different login derivations.
The orchestrator worker clipped last_body at write time, but the repo
didn't enforce. A future caller that forgot the clip would write the
full body. Move the clip to record_synthetic_file and
update_synthetic_file via SYNTHETIC_FILE_BODY_LIMIT in
decnet/web/db/models/realism.py. Worker now passes the full body and
trusts the repo. Tests retargeted to assert repo enforcement.
Four gaps from the realism migration plan, plus one flaky test
fixed.
Added:
- tests/deploy/test_orchestrator_unit.py — replaces the dead
test_emailgen_unit.py. Asserts:
* decnet-orchestrator.service.j2 carries the DECNET_REALISM_*
env block (LLM, MODEL, TIMEOUT, PERSONAS) so per-host tuning
works without editing the .j2.
* Legacy DECNET_EMAILGEN_* vars are NOT referenced — clean break
contract from stage 5.
* decnet.target wants orchestrator + canary, does NOT want
decnet-emailgen.service. Anti-regression for service-collapse.
* deploy/decnet-emailgen.service.j2 stays deleted.
- tests/orchestrator/test_worker_integration.py — new
test_one_tick_email_branch_records_orchestrator_email. Pins the
action-roll to email, seeds a topology with an IMAP mail decky +
two personas, stubs LLM + docker-exec write paths, verifies an
orchestrator_emails row + bus event land. Restores end-to-end
email coverage that was lost when the pre-collapse
test_worker_integration.py was deleted.
- tests/realism/test_synthetic_files_truncation.py — pins the 64KB
last_body cap on create + edit, and documents the consequence:
edit candidates carry a truncated snapshot of files that exceeded
the cap. If a future change lifts the cap, _LIMIT in the test
must lift with it.
Fixed flaky:
- tests/orchestrator/test_scheduler.py — two pick_file tests
pinned to random.Random(1). Without a seed, the 3% canary gate
(stage 7) and 10% leave-alone roll occasionally flaked the
assertions because the _FakeRepo doesn't carry a
create_canary_token method.
Note: the existing
test_realism_subprocess_import_personas_rejects_in_agent_mode
already covers agent-mode rejection of decnet realism
import-personas; no new gating test needed.
Stage 7 — final stage of the realism migration. Canary plants are
now scheduled by the same realism planner that handles inert content,
keeping the orchestrator as the single decision point and avoiding
duplicate diurnal / persona / rate-limit logic in the canary
subsystem.
New surface:
- decnet/canary/cultivator.py: cultivate(plan, repo) builds a
CanaryContext, calls the right generator (canary_aws_creds ->
aws_creds, canary_mysql_dump -> mysql_dump, …), persists the
canary_tokens row before plant so the canary worker can attribute
callbacks even on plant-time previews. Resolves canary placements
to credible operator paths (~/.aws/credentials, ~/.ssh/id_rsa,
/var/backups/db_backup.sql).
- realism/planner.py adds 8 canary content_classes uniformly weighted
inside a 3% probability gate. Hard-capped: each tick at most one
canary; create branch falls through to inert otherwise.
- scheduler.pick_file dispatches canary content_class to the
cultivator; FileAction grows an optional content_bytes field so
binary canary artifacts (DOCX/PDF/honeydoc) survive the wire
intact instead of being utf-8 round-tripped.
- SSHDriver._run_file uses content_bytes when set, falls back to
encoding the str content otherwise.
Stealth (per feedback_stealth.md): cultivator does not introduce
any DECNET literal; the underlying generators are already
stealth-clean and the test suite asserts the contract holds.
Tests cover round-tripping every canary class through the cultivator,
verifying placement-path conventions, persona-login normalisation
("John Smith" -> /home/johnsmith/.aws/credentials), and the
no-DECNET-leak invariant.
Stage 6 of the realism migration. User-class file bodies (note,
todo, draft, script) optionally get LLM-authored content; system
classes (cron / daemon logs, /tmp caches) stay template-only because
formulaic *is* the right look for them.
New surface:
- realism.llm.circuit.LLMCircuitBreaker — process-local sliding-window
breaker. 3 consecutive failures trip open; 60s cooldown to half-open;
half-open success closes, failure re-opens. Protects the orchestrator
tick from sustained Ollama wedges (per-call timeout already covers
one-shot hangs).
- realism.prompts._style — em-dash suppression lifted from the
email prompt. Persona.uses_llms_heavily opts out per the
feedback_em_dash_llm_tell.md memory. Includes strip_em_dashes
belt-and-braces sub for output that slipped past the prompt rule.
- realism.prompts.filebody — class-conditioned prompts (note / todo
/ draft / script) with persona context, language pinning, output
shape rule.
- realism.bodies.make_body_with_llm — async wrapper around make_body
that calls the LLM when one is provided AND the breaker allows.
Falls back to template on timeout / error / empty / system-class.
Wiring:
- scheduler.pick_file accepts optional llm + llm_breaker + llm_timeout.
When the planner picks a create action and the content_class is a
user-class, the body_hint is replaced with the LLM-authored body
(or falls back to the deterministic body_hint).
- orchestrator.worker constructs get_llm() at startup gated by
DECNET_REALISM_LLM env var (any non-empty value enables; empty /
"off" / "none" / "0" disables). Passes llm + breaker through every
tick.
- decnet orchestrate gains --llm/--no-llm flag overriding the env var.
Stage 3b of the realism migration. A TODO.md planted on Monday gets a
checkbox flipped on Tuesday; a notes file grows a follow-up line; a
cron log gets a fresh entry tacked on. The synthetic_files row's
edit_count, last_modified, and content_hash advance.
New surface:
- EditAction dataclass (peer of FileAction in scheduler.py): carries
decky, path, persona, content_class, previous_body, mtime, and
synthetic_file_uuid for the worker's update path.
- realism.bodies.next_iteration(cls, persona, prev, rng): per-class
deterministic mutators. TODO flips an unchecked box and/or appends;
notes/drafts/scripts append; logs are append-only (mirroring real
log behaviour). Canary, cache_tmp, email raise KeyError —
unsupported.
- realism.planner.pick gains an edit branch: 60% create, 30% edit
(when an edit_candidate is supplied), 10% leave-alone. Returns
None on leave-alone — quiet ticks are realism too.
- scheduler.pick_file pre-fetches a single edit candidate via
repo.pick_random_synthetic_file_for_edit ~50% of ticks; the
planner decides whether to use it.
- SSHDriver._run_edit: turns next_iteration output into a
plant_file call (mtime-bumped, mode 0o644). Stashes new_body in
result.payload so the worker can hash it for synthetic_files.
- worker._bump_synthetic_file_after_edit: patches edit_count + 1,
last_modified=now, content_hash, last_body for the row UUID.
No-op when the row was pruned mid-flight.
- events.to_row / topic_for / event_type_for now recognise
EditAction (kind="file", action="file:edit").
Stage 3 of the realism migration. Replaces orchestrator/scheduler.py's
hardcoded _FILE_TEMPLATES/_USERS (3 templates emitting epoch-suffixed
filenames like notes-1777315854.txt with identical bodies per
template) with a persona-driven realism engine.
New surface:
- SyntheticFile SQLModel (synthetic_files table, UNIQUE on
decky_uuid+path) — per-(decky, path) state for the future
edit-in-place flow. Pre-v1, no _migrate_* helper.
- BaseRepository methods: record_synthetic_file,
update_synthetic_file, list_synthetic_files,
pick_random_synthetic_file_for_edit (used by stage 3b).
- realism/naming.py: per-content-class filename templates,
persona-conditioned. /var/log/cron.log + logrotate skeleton for
system-class; /home/<persona>/TODO.md, scratch.md, etc. for
user-class. Anti-regression test pins "no 8+ digit decimals in
basenames" (the realism failure today).
- realism/bodies.py: deterministic body templates per content_class.
TODO body uses checkbox markdown, script body has a shebang, cron
body matches syslog cron shape ("CRON[PID]: (user) CMD (...)").
- realism/planner.py: pick(deckies, now, rng) returns a Plan.
Diurnal-gated, weighted user/system content split (70/30 user
bias). Create-only in stage 3; edit branch lands in stage 3b.
Scheduler split:
- scheduler.pick is now traffic-only (sync).
- scheduler.pick_file is async, takes a repo, resolves personas
(Topology.email_personas for topology-source deckies; global
realism.personas_pool otherwise), and maps Plan -> FileAction.
- FileAction gains persona/content_class/mtime fields.
Worker:
- _one_tick rolls 50/50 between traffic and file each tick. After a
successful FileAction plant, _record_synthetic_file persists or
patches the synthetic_files row (catching the unique-constraint
collision on re-plant of the same path).
- SSHDriver._run_file passes action.mtime through to plant_file so
files don't all stamp at wall-clock-now.
Stage 4 of the realism migration. Lifts the driver Protocol into a
proper ABC with default plant_file/read_file methods (raise
NotImplementedError), and adds get_driver_for(action) so the
orchestrator worker can dispatch by action shape without isinstance
chains.
SSHDriver now inherits ActivityDriver and implements:
- plant_file: streams base64 via stdin (ARG_MAX-safe, mirrors
decnet.canary.planter; commit c17b9e0). Honours mtime via touch -d
so realism-planned files don't all stamp at wall-clock-now.
- read_file: docker exec cat with FileNotFoundError on rc=1, used by
the upcoming EditAction (stage 3b).
EmailDriver inherits ActivityDriver. Driver alias kept for back-compat
during the migration; removed once realism stages 5-7 land.
Empty subpackage skeleton for the realism migration: ContentClass enum
(file/email/canary content categories), Plan dataclass (frozen, with
edit-action invariant), in_work_hours window check (wrap-around
supported, fail-open on parse error), and sample_mtime for backdated
file timestamps that snap into a persona's active hours.
Stage 1 of the orchestrator+canary realism unification — no
production caller wired yet; planner.pick is a stub returning None
until stage 3.
Mirrors the Canarytokens.org trick: a base64-wrapped CHANGE REPLICATION
SOURCE TO + START REPLICA block in the dump trailer. Importing the
file into MySQL resolves <slug>.<dns_zone> (DNS trip) and opens a 3306
replica handshake whose SOURCE_USER smuggles @@hostname and
@@lc_time_names of the victim DB.
DNS lookup alone is sufficient for detection via the existing canary
dns_server; capturing the smuggled metadata via a 3306 handshake
responder is a follow-up.
honeydoc previously emitted HTML only — operators picking 'Document'
out of the dropdown got a .html file dropped at /Documents/
quarterly_report.docx, which any attacker would clock the moment they
ran 'file' on it.
Two new generators that emit the real artifact format:
- honeydoc_docx: stdlib zipfile only. Builds a minimal but valid
Office Open XML zip with the same Q3 review body as the HTML
flavor and an external-image relationship pointing at the
callback URL — same trick the operator-upload DOCX instrumenter
uses, fetched on document open by Word and LibreOffice. Reuses
_drawing() and _next_rid() from instrumenters/docx.py to keep
the body/relationships shape identical between synthesised and
instrumented files.
- honeydoc_pdf: pikepdf-backed. One-page PDF in the 14 base fonts
(Helvetica, no font embedding), realistic body, /OpenAction /URI
on the catalog so most viewers fire the callback on document
open. Falls back to a clear error if pikepdf is missing so the
operator can switch to honeydoc / honeydoc_docx.
Default placement paths now reflect each generator's true extension
(.html / .docx / .pdf) so the UI suggests something sensible. Both
generators surfaced in the New Token modal's generator dropdown.
Real-world plant() crashed with OSError [Errno 7] Argument list too
long when an artifact (honeydoc HTML / DOCX / PDF) base64-encoded
into the sh -c script body exceeded the kernel's argv limit (typically
128KB-2MB depending on the host).
Fix: keep the script trivial ('mkdir -p ... && base64 -d > path && ...')
and stream the encoded bytes through 'docker exec -i ... sh -c'
stdin instead. _run() grew an optional stdin_bytes parameter that's
piped into proc.communicate(input=...). The stdin path covers
arbitrarily large artifacts.
Tests updated:
- test_plant_argv_and_base64_round_trip now asserts the docker -i
flag is present and the base64 payload reaches stdin (and notably
is NOT in the script body).
- _FakeProc.communicate accepts input=None across the board so the
patched fast path no longer trips on the new kwarg.
Fetches GET /deckies on page load and feeds the running fleet into
the create modal as a <select>. Falls back to an empty-state hint
('No deckies running. Deploy a fleet first.') when the list is
empty so the operator isn't staring at an unusable form. Default
selection is the first decky returned.
Switches the page header to the standard .fleet-root .page-header /
.page-title-group / h1 / .page-sub / .actions pattern used by every
other top-level page. Drops the redundant AUTOMATION supertitle (the
sidebar group already labels that) and the inline Target icon next
to the title. Action buttons use the project's btn / btn violet
classes for visual parity with ADD PERSONA / BULK UPLOAD.