Adds 12 new primitives across two waves of spec work this session.
meta.* layer (8 primitives) — corpus-snapshot footprint:
total_messages, corpus_span_days, msg_per_day, active_days,
activity_density, first_seen_ts, last_seen_ts, fingerprint_confidence.
Motivated by two actors with identical message counts (53 each) producing
indistinguishable profiles despite radically different presence shapes
(0.3-day burst vs 47-day long tail).
Language-aware characterization primitives (4 primitives):
stylometric.pos_ngram_signature — SimHash over POS bigram frequency vector;
syntactic skeleton fingerprint that survives full vocabulary paraphrase.
lexical.dialect_region — BCP-47 free_string (es-CL, es-AR, es-MX, …);
designed for EYENET integration with INGEOTEC regional-spanish-models.
lexical.evaluative_morphology_density — diminutive/augmentative/pejorative
suffix density; stable per-author trait baked into language acquisition.
lexical.optional_grammar_signature — SimHash over optional-grammar choice
points (compound/simple past, subjunctive, leísmo, relative pronoun);
high-reliability Spain vs LatAm discriminator.
Also fixes stale scratchpad.md references throughout (README.md is now the
authority), bumps behave-text to 0.1.3, and updates CHANGELOG.
2.7 KiB
2.7 KiB
Changelog
All notable changes to BEHAVE packages are documented here. Format follows Keep a Changelog. Versions follow Semantic Versioning.
[behave-text 0.1.3] — 2026-05-23
behave-text
Added
stylometric.pos_ngram_signature— 64-bit SimHash over POS n-gram (default bigram) frequency vector. Captures syntactic skeleton independent of vocabulary. Tagger-dependent; source label must declare tagger + model + n. Calibration note: noisy on chat-domain text, weight low until validated.lexical.dialect_region— BCP-47 language-region free_string (es-CL,es-AR,es-MX,es-ES,en-US, etc.) for the actor's dominant regional variety, detected from lexical marker density. Emitunknownbelow confidence threshold. Designed for EYENET integration with INGEOTECregional-spanish-modelsvocabulary tables (MIT).lexical.evaluative_morphology_density— numeric [0,1] rate of evaluative morpheme tokens (diminutives, augmentatives, pejoratives, intensives) per total tokens. Stable per-author trait baked into language acquisition; strong Spain/LatAm regional discriminator.lexical.optional_grammar_signature— 64-bit SimHash over author preference probabilities at optional-grammar choice points (for Spanish: compound vs simple past, subjunctive usage, leísmo/laísmo/loísmo, relative pronoun choice). Choice-point set is extractor-defined and declared in source label.
[behave-text 0.1.2] — 2026-05-23
behave-text
Added
meta.*layer — 8 new corpus-snapshot primitives:total_messages,corpus_span_days,msg_per_day,active_days,activity_density,first_seen_ts,last_seen_ts,fingerprint_confidence. Fills the gap between actors with identical message counts but radically different presence shapes (bursty single-session vs long-tail lurker).
Fixed
- Stale
scratchpad.mdreferences inprimitives.pydocstring,tests/test_primitives.pydocstring, andattribution-recipes.md—README.mdis now the authority.
[0.1.0] — 2026-05-17
Initial public release of all three packages.
behave-core
- Shared observation envelope and schema contract (
BehaveObservation) - Pydantic v2 base models for domain-agnostic behavioral records
behave-shell
- Shell-session behavioral observation registry
- Primitive catalog covering command execution, session lifecycle, environment, and navigation events
- Layered on
behave-core
behave-text
- Text/messaging-domain behavioral observation registry
- Primitive catalog covering message composition, conversation, and metadata events
- Layered on
behave-core