Files
BEHAVE/BEHAVE-TEXT/attribution-recipes.md
anti b182e2fe3b feat(text): add meta.* corpus-footprint layer and 4 language-aware primitives (v0.1.3)
Adds 12 new primitives across two waves of spec work this session.

meta.* layer (8 primitives) — corpus-snapshot footprint:
  total_messages, corpus_span_days, msg_per_day, active_days,
  activity_density, first_seen_ts, last_seen_ts, fingerprint_confidence.
  Motivated by two actors with identical message counts (53 each) producing
  indistinguishable profiles despite radically different presence shapes
  (0.3-day burst vs 47-day long tail).

Language-aware characterization primitives (4 primitives):
  stylometric.pos_ngram_signature — SimHash over POS bigram frequency vector;
    syntactic skeleton fingerprint that survives full vocabulary paraphrase.
  lexical.dialect_region — BCP-47 free_string (es-CL, es-AR, es-MX, …);
    designed for EYENET integration with INGEOTEC regional-spanish-models.
  lexical.evaluative_morphology_density — diminutive/augmentative/pejorative
    suffix density; stable per-author trait baked into language acquisition.
  lexical.optional_grammar_signature — SimHash over optional-grammar choice
    points (compound/simple past, subjunctive, leísmo, relative pronoun);
    high-reliability Spain vs LatAm discriminator.

Also fixes stale scratchpad.md references throughout (README.md is now the
authority), bumps behave-text to 0.1.3, and updates CHANGELOG.
2026-05-23 01:54:12 -04:00

32 lines
2.7 KiB
Markdown

<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
# BEHAVE-TEXT Attribution Recipes
> **This document is not part of BEHAVE-TEXT.** BEHAVE-TEXT (`README.md`) defines the observation taxonomy and emission envelope. It does **not** assert who an actor is, link sessions, or assign profiles. Those are attribution-engine concerns.
>
> This document is a **placeholder**. Recipes for the text domain wait for corpus calibration. The Rutify Telegram corpus (forthcoming) will be the labeling ground truth that drives the first concrete profiles.
---
## What goes here eventually
When BEHAVE-TEXT has a calibrated corpus, this document will mirror BEHAVE-SHELL's `attribution-recipes.md` structure:
1. **Engine Interface** — what the engine consumes from BEHAVE-TEXT (`actor.observation.text.*` topics) plus user-supplied labels (`identity.label.applied`); what it emits (`attribution.profile.candidate`, `attribution.profile.current`, `attribution.linkage.proposed`).
2. **Profile Recipes** — observation-pattern definitions for each text-domain operator class. Likely starting points based on the Rutify domain:
- `credential_broker` — high transactional_language, high boasting_pattern, broadcast attention_pattern.
- `low_skill_buyer` — low vocabulary_richness, slow response_latency, high question_formation_style:lexical.
- `group_admin` — high conversation_initiation_rate, focused attention_pattern, high opsec_awareness.
- `lurker_or_observer` — minimal message volume, near-zero conversation_initiation_rate.
- `bot_or_automated_poster` — perfect punctuation_style consistency, no typo_signature, machine-pasted message_length distribution.
3. **Linkage Rules** — rules for proposing identity links across accounts based on stylometric signature similarity. The function_word_distribution simhash is the load-bearing primitive here (Hamming-comparable across sessions, hard to consciously fake).
4. **User-Owned Topic Schemas**`identity.label.applied` and `identity.engagement.authorized` schemas for the text domain.
## What stays out
Same boundary as BEHAVE-SHELL's recipes: profiles describe observation *patterns*, not operator types. Engines combine BEHAVE-TEXT primitives with BEHAVE-SHELL primitives (when the same identity appears in both substrates) and with user-supplied labels to produce attribution.
## Status
**Empty until the Rutify corpus is processed.** Adding speculative recipes here without corpus validation would repeat the v0.1 mistake of emitting confidently-wrong observations. The five labelled BEHAVE-SHELL sessions (HUMAN, YOU-sim, LW-sim, CLAUDE-FF, CLAUDE-CL) are the model: profiles get written *after* a labelled calibration grid exists, not before.