Files

anti b182e2fe3b feat(text): add meta.* corpus-footprint layer and 4 language-aware primitives (v0.1.3)

Adds 12 new primitives across two waves of spec work this session.

meta.* layer (8 primitives) — corpus-snapshot footprint:
  total_messages, corpus_span_days, msg_per_day, active_days,
  activity_density, first_seen_ts, last_seen_ts, fingerprint_confidence.
  Motivated by two actors with identical message counts (53 each) producing
  indistinguishable profiles despite radically different presence shapes
  (0.3-day burst vs 47-day long tail).

Language-aware characterization primitives (4 primitives):
  stylometric.pos_ngram_signature — SimHash over POS bigram frequency vector;
    syntactic skeleton fingerprint that survives full vocabulary paraphrase.
  lexical.dialect_region — BCP-47 free_string (es-CL, es-AR, es-MX, …);
    designed for EYENET integration with INGEOTEC regional-spanish-models.
  lexical.evaluative_morphology_density — diminutive/augmentative/pejorative
    suffix density; stable per-author trait baked into language acquisition.
  lexical.optional_grammar_signature — SimHash over optional-grammar choice
    points (compound/simple past, subjunctive, leísmo, relative pronoun);
    high-reliability Spain vs LatAm discriminator.

Also fixes stale scratchpad.md references throughout (README.md is now the
authority), bumps behave-text to 0.1.3, and updates CHANGELOG.

2026-05-23 01:54:12 -04:00

2.7 KiB

Raw Blame History

BEHAVE-TEXT Attribution Recipes

This document is not part of BEHAVE-TEXT. BEHAVE-TEXT (README.md) defines the observation taxonomy and emission envelope. It does not assert who an actor is, link sessions, or assign profiles. Those are attribution-engine concerns.

This document is a placeholder. Recipes for the text domain wait for corpus calibration. The Rutify Telegram corpus (forthcoming) will be the labeling ground truth that drives the first concrete profiles.

What goes here eventually

When BEHAVE-TEXT has a calibrated corpus, this document will mirror BEHAVE-SHELL's attribution-recipes.md structure:

Engine Interface — what the engine consumes from BEHAVE-TEXT (actor.observation.text.* topics) plus user-supplied labels (identity.label.applied); what it emits (attribution.profile.candidate, attribution.profile.current, attribution.linkage.proposed).
Profile Recipes — observation-pattern definitions for each text-domain operator class. Likely starting points based on the Rutify domain:
- credential_broker — high transactional_language, high boasting_pattern, broadcast attention_pattern.
- low_skill_buyer — low vocabulary_richness, slow response_latency, high question_formation_style:lexical.
- group_admin — high conversation_initiation_rate, focused attention_pattern, high opsec_awareness.
- lurker_or_observer — minimal message volume, near-zero conversation_initiation_rate.
- bot_or_automated_poster — perfect punctuation_style consistency, no typo_signature, machine-pasted message_length distribution.
Linkage Rules — rules for proposing identity links across accounts based on stylometric signature similarity. The function_word_distribution simhash is the load-bearing primitive here (Hamming-comparable across sessions, hard to consciously fake).
User-Owned Topic Schemas — identity.label.applied and identity.engagement.authorized schemas for the text domain.

What stays out

Same boundary as BEHAVE-SHELL's recipes: profiles describe observation patterns, not operator types. Engines combine BEHAVE-TEXT primitives with BEHAVE-SHELL primitives (when the same identity appears in both substrates) and with user-supplied labels to produce attribution.

Status

Empty until the Rutify corpus is processed. Adding speculative recipes here without corpus validation would repeat the v0.1 mistake of emitting confidently-wrong observations. The five labelled BEHAVE-SHELL sessions (HUMAN, YOU-sim, LW-sim, CLAUDE-FF, CLAUDE-CL) are the model: profiles get written after a labelled calibration grid exists, not before.

2.7 KiB Raw Blame History

BEHAVE-TEXT Attribution Recipes

What goes here eventually

What stays out

Status

2.7 KiB

Raw Blame History