docs: per-package READMEs with full primitive catalog and registry notes backfill

- core/README.md: envelope contract, field table, PII discipline, quickstart
- BEHAVE-SHELL/README.md: all 76 primitives documented across 9 categories;
  TLS/SSH/C2 fingerprint sections with [DRAFT — verify] markers on uncertain entries
- BEHAVE-TEXT/README.md: all 35 primitives across 6 categories; Rutify calibration
  notes inline; content.* layer marked EXPERIMENTAL throughout
- primitives.py (SHELL): backfilled notes for all previously undocumented primitives
- primitives.py (TEXT): backfilled notes for capitalization_habit, emoji_*, length,
  linebreak_style, sentence_complexity_class, question_formation_style,
  imperative_style, response_latency_class, message_burst_rate

License: CC-BY-SA-4.0 (prose) / GPL-3.0-or-later (code)
This commit is contained in:
2026-05-10 06:39:57 -04:00
parent 22b57307cf
commit 7f585027b3
5 changed files with 1159 additions and 91 deletions

78
core/README.md Normal file
View File

@@ -0,0 +1,78 @@
<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
# behave-core
[← repo](../README.md)
The shared observation envelope for BEHAVE. Defines the wire format that
`behave-shell` and `behave-text` serialize all behavioral observations into.
Every sensor in the BEHAVE ecosystem emits the same `Observation` structure —
the domain-specific meaning lives in `primitive` and `value`; the envelope
provides identity, provenance, time window, and schema versioning.
## What it provides
| Symbol | Type | Description |
|---|---|---|
| `OBSERVATION_SCHEMA_VERSION` | `int` | Envelope schema version (currently `1`). Bumped when field shapes change; federation gossip receivers reject mismatched versions. |
| `Observation` | Pydantic model | One behavioral observation: a single primitive measured over a time window. The core class is registry-agnostic — it does not validate `primitive` or `value` against any specific domain. Use the registry-aware subclasses in `behave-shell` or `behave-text` for full validation. |
| `ObservationValue` | `Union[str, int, float, bool, list[str], list[int], list[float], dict]` | Type alias covering all valid value shapes. |
| `Window` | Pydantic model | The measurement window: `start_ts` and `end_ts` in epoch seconds. Distinct from `Observation.ts` (the emission time) — a sensor may compute an observation over a past window and emit it later. |
## `Observation` fields
| Field | Type | Required | Description |
|---|---|---|---|
| `primitive` | `str` | ✓ | Fully-qualified primitive path, e.g. `motor.keystroke_cadence` |
| `value` | `ObservationValue` | ✓ | The measured value; shape validated by the domain registry |
| `confidence` | `float [0,1]` | ✓ | Sensor's confidence in this measurement (not in any attribution verdict) |
| `window` | `Window` | ✓ | Measurement time window |
| `source` | `str` | ✓ | Canonical sensor identifier, e.g. `behave/sniffer/timing.py` |
| `evidence_ref` | `str \| None` | — | Pointer to underlying raw evidence (session tape, pcap). **Never** the evidence itself — see PII note below. |
| `identity_ref` | `str \| None` | — | AttackerIdentity UUID if the observation is pre-attributed |
| `ts` | `float` | auto | Emission timestamp, epoch seconds |
| `id` | `str` | auto | UUID hex for deduplication |
| `v` | `int` | auto | Envelope schema version (= `OBSERVATION_SCHEMA_VERSION`) |
## PII discipline (non-negotiable)
BEHAVE observations carry **categorical labels, timing aggregates, and hashes only**.
They must never carry:
- Raw keystroke content or command arguments
- Passwords, tokens, session keys, or any authentication material
- File contents or payload bytes
- Raw message text (especially in `behave-text`)
`evidence_ref` is a **pointer** to underlying evidence held elsewhere. Never the evidence itself.
## Install
```bash
pip install -e .
# or, as a dependency of behave-shell / behave-text:
pip install -e ../core/
```
## Quickstart
```python
from behave_core.spec import Observation, Window, OBSERVATION_SCHEMA_VERSION
obs = Observation(
primitive="motor.keystroke_cadence",
value="bursty",
confidence=0.82,
window=Window(start_ts=1714000000.0, end_ts=1714003600.0),
source="behave/shell-sensor/timing.py",
)
print(obs.model_dump_json())
```
## Tests
```bash
pytest tests/
```
## License
Code: [GPL-3.0-or-later](../LICENSE)