feat(text): add meta.* corpus-footprint layer and 4 language-aware primitives (v0.1.3)
Adds 12 new primitives across two waves of spec work this session.
meta.* layer (8 primitives) — corpus-snapshot footprint:
total_messages, corpus_span_days, msg_per_day, active_days,
activity_density, first_seen_ts, last_seen_ts, fingerprint_confidence.
Motivated by two actors with identical message counts (53 each) producing
indistinguishable profiles despite radically different presence shapes
(0.3-day burst vs 47-day long tail).
Language-aware characterization primitives (4 primitives):
stylometric.pos_ngram_signature — SimHash over POS bigram frequency vector;
syntactic skeleton fingerprint that survives full vocabulary paraphrase.
lexical.dialect_region — BCP-47 free_string (es-CL, es-AR, es-MX, …);
designed for EYENET integration with INGEOTEC regional-spanish-models.
lexical.evaluative_morphology_density — diminutive/augmentative/pejorative
suffix density; stable per-author trait baked into language acquisition.
lexical.optional_grammar_signature — SimHash over optional-grammar choice
points (compound/simple past, subjunctive, leísmo, relative pronoun);
high-reliability Spain vs LatAm discriminator.
Also fixes stale scratchpad.md references throughout (README.md is now the
authority), bumps behave-text to 0.1.3, and updates CHANGELOG.
This commit is contained in:
@@ -1,9 +1,9 @@
|
||||
# SPDX-License-Identifier: GPL-3.0-or-later
|
||||
"""Registry coverage tests for BEHAVE-TEXT.
|
||||
|
||||
Asserts that every primitive listed in scratchpad.md's tables has exactly one
|
||||
Asserts that every primitive listed in README.md's tables has exactly one
|
||||
entry in PRIMITIVE_REGISTRY. Drift-detector — failing this test means
|
||||
scratchpad.md and the registry have diverged.
|
||||
README.md and the registry have diverged.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@@ -13,9 +13,18 @@ from pathlib import Path
|
||||
|
||||
from behave_text.spec import PRIMITIVE_REGISTRY, ValueKind
|
||||
|
||||
# Primitive paths expected by scratchpad.md (hand-extracted; v0).
|
||||
# Primitive paths expected by README.md (hand-extracted; v0).
|
||||
EXPECTED_PRIMITIVES = {
|
||||
# stylometric.* (motor analog — 8)
|
||||
# meta.* (corpus-snapshot footprint — 8)
|
||||
"meta.total_messages",
|
||||
"meta.corpus_span_days",
|
||||
"meta.msg_per_day",
|
||||
"meta.active_days",
|
||||
"meta.activity_density",
|
||||
"meta.first_seen_ts",
|
||||
"meta.last_seen_ts",
|
||||
"meta.fingerprint_confidence",
|
||||
# stylometric.* (motor analog — 13)
|
||||
"stylometric.punctuation_style",
|
||||
"stylometric.capitalization_habit",
|
||||
"stylometric.emoji_usage",
|
||||
@@ -28,7 +37,8 @@ EXPECTED_PRIMITIVES = {
|
||||
"stylometric.function_word_distribution_top200",
|
||||
"stylometric.character_ngram_simhash",
|
||||
"stylometric.distinctive_vocabulary_signature",
|
||||
# lexical.* (cognitive analog — 8)
|
||||
"stylometric.pos_ngram_signature",
|
||||
# lexical.* (cognitive analog — 11)
|
||||
"lexical.vocabulary_richness",
|
||||
"lexical.slang_density",
|
||||
"lexical.code_switching_rate",
|
||||
@@ -37,6 +47,9 @@ EXPECTED_PRIMITIVES = {
|
||||
"lexical.sentence_complexity_class",
|
||||
"lexical.question_formation_style",
|
||||
"lexical.imperative_style",
|
||||
"lexical.dialect_region",
|
||||
"lexical.evaluative_morphology_density",
|
||||
"lexical.optional_grammar_signature",
|
||||
# temporal_evolution.* (lifecycle/change-over-time — 1, added v0.2)
|
||||
"temporal_evolution.lifecycle_phase",
|
||||
# network.* (governance/role-shape — 2, added v0.3)
|
||||
|
||||
Reference in New Issue
Block a user