Files
BEHAVE/CHANGELOG.md
anti b182e2fe3b feat(text): add meta.* corpus-footprint layer and 4 language-aware primitives (v0.1.3)
Adds 12 new primitives across two waves of spec work this session.

meta.* layer (8 primitives) — corpus-snapshot footprint:
  total_messages, corpus_span_days, msg_per_day, active_days,
  activity_density, first_seen_ts, last_seen_ts, fingerprint_confidence.
  Motivated by two actors with identical message counts (53 each) producing
  indistinguishable profiles despite radically different presence shapes
  (0.3-day burst vs 47-day long tail).

Language-aware characterization primitives (4 primitives):
  stylometric.pos_ngram_signature — SimHash over POS bigram frequency vector;
    syntactic skeleton fingerprint that survives full vocabulary paraphrase.
  lexical.dialect_region — BCP-47 free_string (es-CL, es-AR, es-MX, …);
    designed for EYENET integration with INGEOTEC regional-spanish-models.
  lexical.evaluative_morphology_density — diminutive/augmentative/pejorative
    suffix density; stable per-author trait baked into language acquisition.
  lexical.optional_grammar_signature — SimHash over optional-grammar choice
    points (compound/simple past, subjunctive, leísmo, relative pronoun);
    high-reliability Spain vs LatAm discriminator.

Also fixes stale scratchpad.md references throughout (README.md is now the
authority), bumps behave-text to 0.1.3, and updates CHANGELOG.
2026-05-23 01:54:12 -04:00

2.7 KiB

Changelog

All notable changes to BEHAVE packages are documented here. Format follows Keep a Changelog. Versions follow Semantic Versioning.


[behave-text 0.1.3] — 2026-05-23

behave-text

Added

  • stylometric.pos_ngram_signature — 64-bit SimHash over POS n-gram (default bigram) frequency vector. Captures syntactic skeleton independent of vocabulary. Tagger-dependent; source label must declare tagger + model + n. Calibration note: noisy on chat-domain text, weight low until validated.
  • lexical.dialect_region — BCP-47 language-region free_string (es-CL, es-AR, es-MX, es-ES, en-US, etc.) for the actor's dominant regional variety, detected from lexical marker density. Emit unknown below confidence threshold. Designed for EYENET integration with INGEOTEC regional-spanish-models vocabulary tables (MIT).
  • lexical.evaluative_morphology_density — numeric [0,1] rate of evaluative morpheme tokens (diminutives, augmentatives, pejoratives, intensives) per total tokens. Stable per-author trait baked into language acquisition; strong Spain/LatAm regional discriminator.
  • lexical.optional_grammar_signature — 64-bit SimHash over author preference probabilities at optional-grammar choice points (for Spanish: compound vs simple past, subjunctive usage, leísmo/laísmo/loísmo, relative pronoun choice). Choice-point set is extractor-defined and declared in source label.

[behave-text 0.1.2] — 2026-05-23

behave-text

Added

  • meta.* layer — 8 new corpus-snapshot primitives: total_messages, corpus_span_days, msg_per_day, active_days, activity_density, first_seen_ts, last_seen_ts, fingerprint_confidence. Fills the gap between actors with identical message counts but radically different presence shapes (bursty single-session vs long-tail lurker).

Fixed

  • Stale scratchpad.md references in primitives.py docstring, tests/test_primitives.py docstring, and attribution-recipes.mdREADME.md is now the authority.

[0.1.0] — 2026-05-17

Initial public release of all three packages.

behave-core

  • Shared observation envelope and schema contract (BehaveObservation)
  • Pydantic v2 base models for domain-agnostic behavioral records

behave-shell

  • Shell-session behavioral observation registry
  • Primitive catalog covering command execution, session lifecycle, environment, and navigation events
  • Layered on behave-core

behave-text

  • Text/messaging-domain behavioral observation registry
  • Primitive catalog covering message composition, conversation, and metadata events
  • Layered on behave-core