Campaign Clustering
Pre-implementation feature. Goal: graduate per-attacker attribution into campaign-level grouping — recover the fact that a set of distinct attacker rows are in fact one coordinated operation, even when IPs, ASNs, and toolchains diverge.
The full design lives in the repo at
development/CAMPAIGN_CLUSTERING.md.
This page documents the test infrastructure that ships ahead of
the algorithm.
The order is deliberate: simulator first, algorithm second. If we cannot write down what a campaign is in code that produces ground-truth labels, we cannot validate any clusterer we build. The simulator is the specification.
Vocabulary — Unified Kill Chain
decnet/clustering/ukc.py defines UKCPhase, the canonical phase enum.
Pols' Unified Kill Chain (2017): 19 phases across three stages.
| Stage | Phases | Honeypot-observable? |
|---|---|---|
| In (initial foothold) | reconnaissance, resource_development, weaponization, delivery, social_engineering, exploitation, persistence, defense_evasion, command_and_control | partial — pre-target phases (recon/resource_dev/weaponization/social_eng) happen before any decky is touched |
| Through (network propagation) | pivoting, discovery, privilege_escalation, execution, credential_access, lateral_movement | yes — MazeNET-segmented topologies make this a strength, not a gap |
| Out (action on objectives) | collection, exfiltration, impact, objectives | yes |
OBSERVABLE_PHASES is the frozenset (15 of 19) the synthetic generator
will emit events for. The DSL accepts the full enum so a campaign spec
can describe an end-to-end story; unobservable phases parse and validate
but produce no synthetic events. UKC vocabulary aligns with MITRE ATT&CK
tactics, so the same labels will be produced by the future TTP-tagging
worker — fixtures don't need renaming when that lands.
from decnet.clustering.ukc import UKCPhase, OBSERVABLE_PHASES, stage_of
stage_of(UKCPhase.LATERAL_MOVEMENT) # → "through"
UKCPhase.RECONNAISSANCE in OBSERVABLE_PHASES # → False
The synthetic campaign factory
tests/factories/campaign_factory.py parses a YAML DSL describing
actors, UKC phases, and tool signatures, and emits truth-labeled
SyntheticAttacker / SyntheticSession records.
Key contract: deterministic given a seed. Identical YAML + identical seed → identical attacker IDs and session IDs across runs. This is load-bearing for fixture stability and is checked by an explicit test.
Quick poke from a Python REPL
The factory is a library, not a test module — pytest does not collect it. Drive it directly:
source .311/bin/activate
python -c "
from tests.factories.campaign_factory import generate, load_yaml
spec = load_yaml('tests/fixtures/campaigns/lone_wolf.yaml')
corpus = generate(spec, seed=0)
print(f'{len(corpus.attackers)} attackers, {len(corpus.sessions)} sessions')
for a in corpus.attackers[:3]:
print(f' {a.attacker_id[:24]} → campaign={a.truth_campaign_id}')
print('truth labels:', corpus.truth_labels())
"
DSL shape
corpus:
campaigns:
- campaign:
id: c-apt-fauxbear-01
actors:
- id: a-001
asn: 14061
ip_pool: rotating # rotating | sticky | tor
ja3: "769,4865-..."
hassh: "aae6b9..."
hours_active_utc: [22, 23, 0, 1, 2, 3]
jitter_seconds: 90
phases: # any UKCPhase value
- name: delivery
actor: a-001
tool_signature: { user_agent: "Nmap" }
target_selector: { count: 50 }
dwell_seconds: 1
- name: persistence
actor: a-001
target_selector: { decky: previous_success }
duration_days: 7
pause_windows: [] # [[start_day, end_day], ...]
noise:
scanner_count: 8 # opportunistic singletons
Single-campaign specs may omit the corpus: wrapper and provide
campaign: at the top level.
Hard validation errors
The DSL parser (_validate_campaign_spec) raises DSLValidationError
on:
- missing
campaign,id,actors, orphaseskeys - empty actor list
- unknown UKC phase names
- a phase referencing an actor not declared in
actors:
Unobservable phases produce a warning, not an error — the spec is allowed to describe pre-target activity, the generator just emits nothing for it.
Metric harness
tests/clustering/metrics.py. Pure-Python, no sklearn/numpy dependency.
Decided before any clustering algorithm exists, on purpose: pick the
metric after seeing results and you'll pick the one that flatters the
algorithm.
Four metrics, none individually sufficient:
| Metric | Catches | Range |
|---|---|---|
adjusted_rand_index |
overall partition agreement (chance-corrected) | typically [0, 1]; negative possible |
homogeneity |
false merges — distinct campaigns wrongly fused | [0, 1] |
completeness |
false splits — one campaign torn across clusters | [0, 1] |
singleton_recall |
noise absorption — lone wolves swallowed by real campaigns | [0, 1] |
Homogeneity and completeness trade off; both must be reported. Singleton recall exists because ARI/homogeneity/completeness all dilute the cost of absorbing background scanners — and that absorption is the failure mode that makes attribution useless in practice.
from tests.clustering.metrics import score
truth = {"a": "C1", "b": "C1", "c": "C2"}
pred = {"a": "X", "b": "X", "c": "Y"}
score(truth, pred)
# {
# 'adjusted_rand_index': 1.0,
# 'homogeneity': 1.0,
# 'completeness': 1.0,
# 'singleton_recall': 1.0,
# }
Fixtures
tests/fixtures/campaigns/ — YAML scenarios with paired
*.expected.yaml bound files. Six fixtures planned (see the design
doc); fixture 3 (lone_wolf) ships first because it exercises the full
DSL → factory → metrics pipeline against the simplest ground truth
(every actor is a singleton).
| # | Fixture | Property under test |
|---|---|---|
| 1 | shared_wordlist (planned) |
credential overlap alone must not merge campaigns |
| 2 | vpn_hopping (planned) |
actor identity survives IP/ASN churn |
| 3 | lone_wolf ✓ |
opportunistic scanners stay singleton |
| 4 | paused_campaign (planned) |
temporal gaps must not split a campaign |
| 5 | multi_operator (planned) |
UKC phase handoff merges across operators with diverged infra |
| 6 | noise_floor (planned) |
all of the above survive 10× background scanner pollution |
Each fixture's bounds (adjusted_rand_index.min, homogeneity.min,
etc.) are loose at v1 and ratchet up as the clusterer matures.
Loosening a bound to make CI pass requires PR-comment justification.
Running the tests
source .311/bin/activate
# all 18 tests
pytest tests/clustering/ -v
# scoped runs
pytest tests/clustering/test_metrics.py -v # metric sanity
pytest tests/clustering/test_campaign_factory.py -v # factory determinism + DSL validation
pytest tests/clustering/test_lone_wolf_fixture.py -v # end-to-end pipeline
tests/factories/campaign_factory.py is a library, not a test module.
Pytest will not collect it directly — invoke the test files in
tests/clustering/ instead.
What hasn't been built yet
- Clusterer worker (
decnet clusterer). Connected-components on a similarity graph is the planned v1 algorithm; ML stays out until a fixture proves CC inadequate. campaignstable +attackers.campaign_idFK.- Bus signals
campaign.{id}.formed/campaign.{id}.updated. - Dashboard surface — Campaigns list page + CampaignDetail with UKC phase timeline.
- Fixtures 1, 2, 4, 5, 6 — each property the algorithm must satisfy gets its own scenario.
- Replay tier — public-dataset replay (Honeynet SSH corpora, DShield) through the live collector. Reality check on whether our DSL captures the right dimensions. Post-v1.
DSL evolution (concurrent phases, multi-actor per phase, probabilistic
ordering) is documented as a deferred extension in
development/DEVELOPMENT_V2.md — the design doesn't block it; we just
don't need it before fixtures 1–6 ship.
See also: Service-Bus (where future
campaign.{id}.formed signals will live), Testing-and-CI,
Module-Reference-Workers.
DECNET
User docs
- Quick-Start
- Installation
- Requirements-and-Python-Versions
- CLI-Reference
- INI-Config-Format
- Custom-Services
- Services-Catalog
- Service-Personas
- Archetypes
- Distro-Profiles
- OS-Fingerprint-Spoofing
- Networking-MACVLAN-IPVLAN
- Deployment-Modes
- SWARM-Mode
- Tailscale-Global-Deployment
- Resource-Footprint
- MazeNET
- Remote-Updates
- Environment-Variables
- Teardown-and-State
- Database-Drivers
- Systemd-Setup
- Logging-and-Syslog
- Fingerprinting
- Service-Bus
- Realism
- Web-Dashboard
- REST-API-Reference
- Mutation-and-Randomization
- Troubleshooting
Developer docs
DECNET — honeypot deception-network framework. Pre-1.0, active development — use with caution. See Sponsors to support the project. Contact: samuel@securejump.cl