# BEHAVE-SHELL — Phase 6 smoke Two halves: 1. **Offline replay** — `smoke.sh` replays the five 2026-05-02 calibration shards through the production handler. Exercises the engine + storage layer end-to-end without a live PTY. Suitable for CI. 2. **Live decky round-trip** — manual procedure below. Confirms the bus / collector / disk-reach plumbing on a real session. ## 1. Offline replay ```sh $ scripts/behave_shell/smoke.sh # auto-discovers ../BEHAVE/prototype_extractors/shell $ scripts/behave_shell/smoke.sh /path/to/calibration/dir # explicit dir ``` Expected output (15 sessions across 5 classes, 424 total observations on the current corpus): ``` [HUMAN] sessions=1 observations=34 distinct_primitives=34 [YOU-sim] sessions=2 observations=59 distinct_primitives=34 [LW-sim] sessions=5 observations=136 distinct_primitives=34 [CLAUDE-FF] sessions=3 observations=84 distinct_primitives=34 [CLAUDE-CL] sessions=4 observations=111 distinct_primitives=34 smoke: OK — all classes emit observations end-to-end ``` Exit codes: `0` full pass, `1` any class regressed, `2` argument / IO error. The replay drives `decnet.profiler.behave_shell._handler.handle_session_ended` directly against a temp SQLite DB seeded with one Attacker per session. Bus emission is captured by an in-process publisher; no real bus is required. ## 2. Live decky round-trip (manual) End-to-end confirmation. Run **once** before tagging v0 and **after** any change to the bus / collector / disk-reach layer. ### Setup 1. Init a fresh DECNET host (see `decnet init`). 2. `decnet bus` worker is up (systemd unit `decnet-bus.service` or `scripts/bus/smoke.sh`). 3. `decnet-profiler.service` is up — it owns the `attacker.session.ended` subscription and the BEHAVE-SHELL handler. 4. `decnet-collector.service` is up — it publishes `attacker.session.ended` from `session_recorded` log events. 5. Web API is up; you have a viewer JWT in your browser localStorage. 6. Deploy a single `ssh` decky: ```sh $ decnet decky deploy --service ssh --decky smoke-decky ``` The decky's sessrec wrapper appends to `/var/lib/decnet/artifacts/smoke-decky/ssh/transcripts/sessions-.jsonl`. ### Run one session per calibration class For each class, SSH into the decky and reproduce the canonical workload. Log out via the documented exit path so the `session_recorded` event fires. The collector aggregates the session and publishes `attacker.session.ended`; the profiler worker disk-reaches the shard, runs `extract_session()`, persists rows, publishes one `attacker.observation.` per emission. | Class | Workload sketch | Expected dominant primitives | |---|---|---| | HUMAN | Type each command live; correct typos; pause to read output. | `motor.input_modality=typed`, `cognitive.feedback_loop_engagement=closed_loop` | | YOU-sim | Paste short pre-canned commands at typing speed; minimal repeats. | `motor.input_modality=pasted`, `motor.paste_burst_rate=occasional`, `cognitive.command_branch_diversity=linear_playbook` | | LW-sim | Paste a recon sweep generated by a small LLM; ~2-8s between pastes. | `cognitive.inter_command_latency_class=llm_lightweight` | | CLAUDE-FF | Paste outputs from a fire-and-forget reasoning agent; ~8-30s gaps. | `cognitive.inter_command_latency_class=llm_heavyweight`, `cognitive.feedback_loop_engagement=fire_and_forget` | | CLAUDE-CL | Drive a closed-loop plan-execute-observe agent; >30s pauses on long output. | `cognitive.inter_command_latency_class=long`, `cognitive.feedback_loop_engagement=closed_loop` | ### Verify For each class, after disconnecting: 1. **DB row landing** — within ~30s (the profiler tick interval), `observations` carries one row per primitive for the new attacker: ```sh $ sqlite3 /var/lib/decnet/decnet.db \ "SELECT primitive, value, confidence FROM observations \ WHERE evidence_ref LIKE 'shard:smoke-decky/%' ORDER BY ts DESC LIMIT 40;" ``` 2. **Bus events** — tail the bus worker log; you should see one `attacker.observation.` per emitted row, plus the originating `attacker.session.ended`. 3. **AttackerDetail panel** — open `/attackers/` in the browser. The Behavioural primitives section should hydrate from the REST snapshot and live-update each time you replay the session (the SSE route forwards the new emissions in real time). ### Pass criteria * All 5 classes produce ≥ 27 distinct primitives in `observations` (the per-shard hard gate from `tests/profiler/behave_shell/test_calibration_grid.py`). * The four day-one priority primitives appear in the panel and carry the expected values per class (table above). * No collector / profiler / web errors in the journal during the round-trip. If any class regresses: rollback the last commit and run the offline replay (`smoke.sh`) to localise — same handler, no transport noise.