Stage 3 of the realism migration. Replaces orchestrator/scheduler.py's
hardcoded _FILE_TEMPLATES/_USERS (3 templates emitting epoch-suffixed
filenames like notes-1777315854.txt with identical bodies per
template) with a persona-driven realism engine.
New surface:
- SyntheticFile SQLModel (synthetic_files table, UNIQUE on
decky_uuid+path) — per-(decky, path) state for the future
edit-in-place flow. Pre-v1, no _migrate_* helper.
- BaseRepository methods: record_synthetic_file,
update_synthetic_file, list_synthetic_files,
pick_random_synthetic_file_for_edit (used by stage 3b).
- realism/naming.py: per-content-class filename templates,
persona-conditioned. /var/log/cron.log + logrotate skeleton for
system-class; /home/<persona>/TODO.md, scratch.md, etc. for
user-class. Anti-regression test pins "no 8+ digit decimals in
basenames" (the realism failure today).
- realism/bodies.py: deterministic body templates per content_class.
TODO body uses checkbox markdown, script body has a shebang, cron
body matches syslog cron shape ("CRON[PID]: (user) CMD (...)").
- realism/planner.py: pick(deckies, now, rng) returns a Plan.
Diurnal-gated, weighted user/system content split (70/30 user
bias). Create-only in stage 3; edit branch lands in stage 3b.
Scheduler split:
- scheduler.pick is now traffic-only (sync).
- scheduler.pick_file is async, takes a repo, resolves personas
(Topology.email_personas for topology-source deckies; global
realism.personas_pool otherwise), and maps Plan -> FileAction.
- FileAction gains persona/content_class/mtime fields.
Worker:
- _one_tick rolls 50/50 between traffic and file each tick. After a
successful FileAction plant, _record_synthetic_file persists or
patches the synthetic_files row (catching the unique-constraint
collision on re-plant of the same path).
- SSHDriver._run_file passes action.mtime through to plant_file so
files don't all stamp at wall-clock-now.
73 lines
2.9 KiB
Python
73 lines
2.9 KiB
Python
"""Realism — synthetic-file state across orchestrator ticks.
|
|
|
|
The orchestrator's pre-realism file generator forgot every file the
|
|
moment it was planted: each tick wrote a brand-new ``notes-{ts}.txt``
|
|
with a literal unix-epoch suffix. No edits, no rotation, no diurnal
|
|
shape — three of the realism failures the migration is fixing.
|
|
|
|
:class:`SyntheticFile` is the per-(decky, path) memory that lets the
|
|
realism engine read back yesterday's ``TODO.md``, mutate it, write
|
|
back the new body, and let the dashboard inspect the lineage.
|
|
|
|
Pre-v1: schema lives directly in the SQLModel; no ``_migrate_*``
|
|
helper (per the project's "no new migrations pre-v1" rule —
|
|
``feedback_no_new_migrations_prev1.md``). Alembic lands at v1.
|
|
"""
|
|
from datetime import datetime, timezone
|
|
from typing import Any, List
|
|
from uuid import uuid4
|
|
|
|
from pydantic import BaseModel
|
|
from sqlalchemy import Column, Index, Text, UniqueConstraint
|
|
from sqlmodel import Field, SQLModel
|
|
|
|
|
|
class SyntheticFile(SQLModel, table=True):
|
|
"""One realism-planted file on one decky.
|
|
|
|
The unique key is ``(decky_uuid, path)`` — there's at most one
|
|
realism record per location, even if the planter has rotated the
|
|
file (rotation updates ``edit_count`` and ``last_modified``, not
|
|
a new row).
|
|
|
|
``last_body`` is capped — large blobs (DOCX/PDF, future canary
|
|
artifacts) are truncated at write time. The edit-in-place flow
|
|
(stage 3b) only needs the body when the content class supports
|
|
body-level mutation (``note``, ``todo``, ``draft``, ``script``),
|
|
so storing the canonical bytes for binary blobs would be wasted.
|
|
|
|
``content_hash`` is sha256 of the *body bytes only* — never of
|
|
metadata or wrapper headers — so a hash compare is a cheap
|
|
"did the body change?" check across edits.
|
|
"""
|
|
__tablename__ = "synthetic_files"
|
|
__table_args__ = (
|
|
UniqueConstraint(
|
|
"decky_uuid", "path", name="uq_synthetic_files_decky_path",
|
|
),
|
|
Index("ix_synthetic_files_decky_modified", "decky_uuid", "last_modified"),
|
|
)
|
|
uuid: str = Field(default_factory=lambda: str(uuid4()), primary_key=True)
|
|
decky_uuid: str = Field(index=True, max_length=64)
|
|
path: str = Field(max_length=1024)
|
|
persona: str = Field(max_length=128) # EmailPersona.name
|
|
content_class: str = Field(max_length=32, index=True) # ContentClass enum value
|
|
created_at: datetime = Field(
|
|
default_factory=lambda: datetime.now(timezone.utc), index=True,
|
|
)
|
|
last_modified: datetime = Field(
|
|
default_factory=lambda: datetime.now(timezone.utc),
|
|
)
|
|
edit_count: int = Field(default=0)
|
|
content_hash: str = Field(max_length=64) # sha256 hex
|
|
last_body: str = Field(
|
|
sa_column=Column("last_body", Text, nullable=False, default="")
|
|
)
|
|
|
|
|
|
class SyntheticFilesResponse(BaseModel):
|
|
total: int
|
|
limit: int
|
|
offset: int
|
|
data: List[dict[str, Any]]
|