Files

anti 69c8cfd2b9 test(profiler/behave_shell): Phase 6 smoke harness + live-decky runbook

Two-half deliverable per BEHAVE-INTEGRATION.md §587-594:

* scripts/behave_shell/replay_calibration.py — Python helper that
  drives the production handler against one asciinema shard, mints
  a temp SQLite repo + an Attacker per session, captures bus
  emissions in-process. Exits non-zero on zero-observation sessions.

* scripts/behave_shell/smoke.sh — bash entry that replays all five
  2026-05-02 calibration shards (HUMAN / YOU-sim / LW-sim /
  CLAUDE-FF / CLAUDE-CL). Auto-activates .311 venv, forces
  DECNET_DB_TYPE=sqlite, prints per-class summary. Suitable for CI.

* scripts/behave_shell/README.md — runbook covering both halves.
  Pins the manual live-decky procedure (one SSH session per class
  against a deployed smoke-decky, expected dominant primitives table,
  SQL verification query, AttackerDetail panel check, pass criteria).

* BEHAVE-INTEGRATION.md — Phase 6 completion log appended with
  current corpus results table (15 sessions, 424 observations across
  the five classes) and a note that the v0 tag (drop -pre) is gated
  on the manual live-decky round-trip and lands as a separate
  commit.

Live-decky run is intentionally NOT scripted — the integration doc
calls for manual SSH sessions per class so an operator confirms the
bus / collector / disk-reach plumbing under real PTY conditions.

2026-05-08 21:42:11 -04:00

README.md

test(profiler/behave_shell): Phase 6 smoke harness + live-decky runbook

2026-05-08 21:42:11 -04:00

replay_calibration.py

test(profiler/behave_shell): Phase 6 smoke harness + live-decky runbook

2026-05-08 21:42:11 -04:00

smoke.sh

test(profiler/behave_shell): Phase 6 smoke harness + live-decky runbook

2026-05-08 21:42:11 -04:00

README.md

BEHAVE-SHELL — Phase 6 smoke

Two halves:

Offline replay — smoke.sh replays the five 2026-05-02 calibration shards through the production handler. Exercises the engine + storage layer end-to-end without a live PTY. Suitable for CI.
Live decky round-trip — manual procedure below. Confirms the bus / collector / disk-reach plumbing on a real session.

1. Offline replay

$ scripts/behave_shell/smoke.sh                             # auto-discovers ../BEHAVE/prototype_extractors/shell
$ scripts/behave_shell/smoke.sh /path/to/calibration/dir    # explicit dir

Expected output (15 sessions across 5 classes, 424 total observations on the current corpus):

[HUMAN]      sessions=1 observations=34 distinct_primitives=34
[YOU-sim]    sessions=2 observations=59 distinct_primitives=34
[LW-sim]     sessions=5 observations=136 distinct_primitives=34
[CLAUDE-FF]  sessions=3 observations=84 distinct_primitives=34
[CLAUDE-CL]  sessions=4 observations=111 distinct_primitives=34
smoke: OK — all classes emit observations end-to-end

Exit codes: 0 full pass, 1 any class regressed, 2 argument / IO error.

The replay drives decnet.profiler.behave_shell._handler.handle_session_ended directly against a temp SQLite DB seeded with one Attacker per session. Bus emission is captured by an in-process publisher; no real bus is required.

2. Live decky round-trip (manual)

End-to-end confirmation. Run once before tagging v0 and after any change to the bus / collector / disk-reach layer.

Setup

Init a fresh DECNET host (see decnet init).
decnet bus worker is up (systemd unit decnet-bus.service or scripts/bus/smoke.sh).
decnet-profiler.service is up — it owns the attacker.session.ended subscription and the BEHAVE-SHELL handler.
decnet-collector.service is up — it publishes attacker.session.ended from session_recorded log events.
Web API is up; you have a viewer JWT in your browser localStorage.
Deploy a single ssh decky:
```
$ decnet decky deploy --service ssh --decky smoke-decky
```
The decky's sessrec wrapper appends to /var/lib/decnet/artifacts/smoke-decky/ssh/transcripts/sessions-<UTC-DAY>.jsonl.

Run one session per calibration class

For each class, SSH into the decky and reproduce the canonical workload. Log out via the documented exit path so the session_recorded event fires. The collector aggregates the session and publishes attacker.session.ended; the profiler worker disk-reaches the shard, runs extract_session(), persists rows, publishes one attacker.observation.<primitive> per emission.

Class	Workload sketch	Expected dominant primitives
HUMAN	Type each command live; correct typos; pause to read output.	`motor.input_modality=typed`, `cognitive.feedback_loop_engagement=closed_loop`
YOU-sim	Paste short pre-canned commands at typing speed; minimal repeats.	`motor.input_modality=pasted`, `motor.paste_burst_rate=occasional`, `cognitive.command_branch_diversity=linear_playbook`
LW-sim	Paste a recon sweep generated by a small LLM; ~2-8s between pastes.	`cognitive.inter_command_latency_class=llm_lightweight`
CLAUDE-FF	Paste outputs from a fire-and-forget reasoning agent; ~8-30s gaps.	`cognitive.inter_command_latency_class=llm_heavyweight`, `cognitive.feedback_loop_engagement=fire_and_forget`
CLAUDE-CL	Drive a closed-loop plan-execute-observe agent; >30s pauses on long output.	`cognitive.inter_command_latency_class=long`, `cognitive.feedback_loop_engagement=closed_loop`

Verify

For each class, after disconnecting:

DB row landing — within ~30s (the profiler tick interval), observations carries one row per primitive for the new attacker:

$ sqlite3 /var/lib/decnet/decnet.db \
    "SELECT primitive, value, confidence FROM observations \
     WHERE evidence_ref LIKE 'shard:smoke-decky/%' ORDER BY ts DESC LIMIT 40;"

Bus events — tail the bus worker log; you should see one attacker.observation.<primitive> per emitted row, plus the originating attacker.session.ended.
AttackerDetail panel — open /attackers/<uuid> in the browser. The Behavioural primitives section should hydrate from the REST snapshot and live-update each time you replay the session (the SSE route forwards the new emissions in real time).

Pass criteria

All 5 classes produce ≥ 27 distinct primitives in observations (the per-shard hard gate from tests/profiler/behave_shell/test_calibration_grid.py).
The four day-one priority primitives appear in the panel and carry the expected values per class (table above).
No collector / profiler / web errors in the journal during the round-trip.

If any class regresses: rollback the last commit and run the offline replay (smoke.sh) to localise — same handler, no transport noise.