Two-half deliverable per BEHAVE-INTEGRATION.md §587-594: * scripts/behave_shell/replay_calibration.py — Python helper that drives the production handler against one asciinema shard, mints a temp SQLite repo + an Attacker per session, captures bus emissions in-process. Exits non-zero on zero-observation sessions. * scripts/behave_shell/smoke.sh — bash entry that replays all five 2026-05-02 calibration shards (HUMAN / YOU-sim / LW-sim / CLAUDE-FF / CLAUDE-CL). Auto-activates .311 venv, forces DECNET_DB_TYPE=sqlite, prints per-class summary. Suitable for CI. * scripts/behave_shell/README.md — runbook covering both halves. Pins the manual live-decky procedure (one SSH session per class against a deployed smoke-decky, expected dominant primitives table, SQL verification query, AttackerDetail panel check, pass criteria). * BEHAVE-INTEGRATION.md — Phase 6 completion log appended with current corpus results table (15 sessions, 424 observations across the five classes) and a note that the v0 tag (drop -pre) is gated on the manual live-decky round-trip and lands as a separate commit. Live-decky run is intentionally NOT scripted — the integration doc calls for manual SSH sessions per class so an operator confirms the bus / collector / disk-reach plumbing under real PTY conditions.
BEHAVE-SHELL — Phase 6 smoke
Two halves:
- Offline replay —
smoke.shreplays the five 2026-05-02 calibration shards through the production handler. Exercises the engine + storage layer end-to-end without a live PTY. Suitable for CI. - Live decky round-trip — manual procedure below. Confirms the bus / collector / disk-reach plumbing on a real session.
1. Offline replay
$ scripts/behave_shell/smoke.sh # auto-discovers ../BEHAVE/prototype_extractors/shell
$ scripts/behave_shell/smoke.sh /path/to/calibration/dir # explicit dir
Expected output (15 sessions across 5 classes, 424 total observations on the current corpus):
[HUMAN] sessions=1 observations=34 distinct_primitives=34
[YOU-sim] sessions=2 observations=59 distinct_primitives=34
[LW-sim] sessions=5 observations=136 distinct_primitives=34
[CLAUDE-FF] sessions=3 observations=84 distinct_primitives=34
[CLAUDE-CL] sessions=4 observations=111 distinct_primitives=34
smoke: OK — all classes emit observations end-to-end
Exit codes: 0 full pass, 1 any class regressed, 2 argument /
IO error.
The replay drives decnet.profiler.behave_shell._handler.handle_session_ended
directly against a temp SQLite DB seeded with one Attacker per
session. Bus emission is captured by an in-process publisher; no
real bus is required.
2. Live decky round-trip (manual)
End-to-end confirmation. Run once before tagging v0 and after any change to the bus / collector / disk-reach layer.
Setup
- Init a fresh DECNET host (see
decnet init). decnet busworker is up (systemd unitdecnet-bus.serviceorscripts/bus/smoke.sh).decnet-profiler.serviceis up — it owns theattacker.session.endedsubscription and the BEHAVE-SHELL handler.decnet-collector.serviceis up — it publishesattacker.session.endedfromsession_recordedlog events.- Web API is up; you have a viewer JWT in your browser localStorage.
- Deploy a single
sshdecky:The decky's sessrec wrapper appends to$ decnet decky deploy --service ssh --decky smoke-decky/var/lib/decnet/artifacts/smoke-decky/ssh/transcripts/sessions-<UTC-DAY>.jsonl.
Run one session per calibration class
For each class, SSH into the decky and reproduce the canonical
workload. Log out via the documented exit path so the
session_recorded event fires. The collector aggregates the session
and publishes attacker.session.ended; the profiler worker
disk-reaches the shard, runs extract_session(), persists rows,
publishes one attacker.observation.<primitive> per emission.
| Class | Workload sketch | Expected dominant primitives |
|---|---|---|
| HUMAN | Type each command live; correct typos; pause to read output. | motor.input_modality=typed, cognitive.feedback_loop_engagement=closed_loop |
| YOU-sim | Paste short pre-canned commands at typing speed; minimal repeats. | motor.input_modality=pasted, motor.paste_burst_rate=occasional, cognitive.command_branch_diversity=linear_playbook |
| LW-sim | Paste a recon sweep generated by a small LLM; ~2-8s between pastes. | cognitive.inter_command_latency_class=llm_lightweight |
| CLAUDE-FF | Paste outputs from a fire-and-forget reasoning agent; ~8-30s gaps. | cognitive.inter_command_latency_class=llm_heavyweight, cognitive.feedback_loop_engagement=fire_and_forget |
| CLAUDE-CL | Drive a closed-loop plan-execute-observe agent; >30s pauses on long output. | cognitive.inter_command_latency_class=long, cognitive.feedback_loop_engagement=closed_loop |
Verify
For each class, after disconnecting:
- DB row landing — within ~30s
(the profiler tick interval),
observationscarries one row per primitive for the new attacker:$ sqlite3 /var/lib/decnet/decnet.db \ "SELECT primitive, value, confidence FROM observations \ WHERE evidence_ref LIKE 'shard:smoke-decky/%' ORDER BY ts DESC LIMIT 40;" - Bus events — tail the bus worker log; you should see one
attacker.observation.<primitive>per emitted row, plus the originatingattacker.session.ended. - AttackerDetail panel — open
/attackers/<uuid>in the browser. The Behavioural primitives section should hydrate from the REST snapshot and live-update each time you replay the session (the SSE route forwards the new emissions in real time).
Pass criteria
- All 5 classes produce ≥ 27 distinct primitives in
observations(the per-shard hard gate fromtests/profiler/behave_shell/test_calibration_grid.py). - The four day-one priority primitives appear in the panel and carry the expected values per class (table above).
- No collector / profiler / web errors in the journal during the round-trip.
If any class regresses: rollback the last commit and run the offline
replay (smoke.sh) to localise — same handler, no transport noise.