Commit Graph

369 Commits

Author SHA1 Message Date
4c37ece39e feat(orchestrator): MVP synthetic life-injection worker (SSH only)
Adds a new decnet orchestrate worker whose job is to keep the honeypot
ecosystem from looking suspiciously static — a frozen LAN with no
inter-host traffic and no filesystem aging is its own honeypot tell.

MVP scope:
- New OrchestratorEvent table + repo methods (purpose-built sibling
  to Log so synthetic events stay separable from attacker-driven ones).
- New orchestrator.{activity,file}.<decky_id> bus topics +
  system.orchestrator.health heartbeat.
- SSH-only driver. Traffic action runs python3 inside src container
  to TCP-connect dst:22 and read the SSH banner — real on-the-wire
  SSH-protocol traffic without shipping creds. File action drops or
  refreshes a small file via docker exec on the destination.
- Random scheduler (50/50 traffic/file when >=2 SSH-capable deckies
  are running). Diurnal shaping, role-aware pairing, and session-aware
  backoff are explicit non-goals for MVP.
- CLI registration, systemd unit (SupplementaryGroups=docker),
  worker-registry entry so the dashboard shows orchestrator health.
- 11 tests: scheduler policy, driver argv shape + injection-safety,
  end-to-end one-tick integration with FakeBus + SQLite.
2026-04-26 19:43:20 -04:00
d531cea536 feat(web): read-only campaigns API + SSE + frontend
API: /api/v1/campaigns (paginated list), /api/v1/campaigns/{uuid}
(soft-merge chain follow), /api/v1/campaigns/{uuid}/identities
(member identities), and /api/v1/campaigns/events (SSE under
campaign.> + JWT-via-?token=, snapshot-on-connect). Mirror of the
identity router; same auth, same shape, same OpenAPI tags pattern.

Frontend: CampaignDetail.tsx page (same visual vocabulary as
IdentityDetail), useCampaignStream hook (mirror of
useIdentityStream), /campaigns/:id route, IdentityDetail's
CAMPAIGN badge becomes clickable and navigates to the campaign.
useIdentityStream now listens for identity.campaign.assigned so
the badge appears live without a manual refresh.
2026-04-26 09:20:17 -04:00
75af00c9c8 test(clustering): full-bound passes through production campaign clusterer
Runs the chained identity + campaign clustering pipeline against all
seven fixtures via from_synthetic / from_synthetic_identity adapters
and ratchets every YAML floor to 1.0 — the production clusterer
(and the reference clusterers used in the per-fixture tests) all
score perfectly across ARI / homogeneity / completeness /
singleton_recall on each fixture.

Three substrate fixes surfaced by the ratchet:

- Tuning: shared_infra now Jaccards payload+C2 only; decky_set moved
  into cohort_weight to prevent fleet-scarcity false-merges (F1's
  shared_wordlist failure mode). Tier weight raised to 1.0 so
  shared payload+C2 alone crosses threshold (F5's intended pass).
- Adapter: from_synthetic_identity now reads SyntheticSession
  started_at + duration_s for session_windows and per-decky
  timestamps (the production-row adapter still uses start_ts/end_ts
  when available).
- Fixture data: paused_campaign.yaml's JA3 collided exactly with
  vpn_hopping.yaml's (same TLS extension list). The collision
  fused two unrelated campaigns under the chained identity layer
  in the noise_floor composite. Made paused's JA3 distinct.

Also wires Campaign / CampaignsResponse into models/__init__.py's
__all__ that was missed in the schema commit.
2026-04-26 09:13:59 -04:00
6936a1426c feat(clustering): campaign-clusterer worker + bus topics + CLI
The campaign clusterer worker mirrors the identity-side worker shell
(bus connect, heartbeat, control listener, slow-tick fallback) but
wakes on identity.> instead of attacker.> — campaign-level work is
gated on identity-layer changes, not raw observations.

The connected-components implementation reads identities via
list_identities_for_clustering, projects them with from_identity_row,
runs union-find over combined_campaign_weight, writes campaigns rows,
sets attacker_identities.campaign_id, and runs the same revocable-
merge pass as the identity layer (a merged-out campaign whose
identities no longer co-cluster with the winner gets revoked).

Bus: adds campaign.> family (formed / identity.assigned / merged /
unmerged) plus the cross-family identity.campaign.assigned so
existing identity-stream subscribers see the badge update without
having to subscribe to campaign.>. Wiki Service-Bus.md updated in
wiki-checkout in the same wave per the project's bus-signals
discipline.

CLI: decnet campaign-clusterer registered as master-only via
MASTER_ONLY_COMMANDS; --poll-interval / --daemon mirror the identity
clusterer command surface.
2026-04-26 09:04:00 -04:00
0946bab424 feat(clustering): campaign-level similarity primitives
The signal taxonomy for the campaign clusterer (next commit). Mirror
of the identity-layer module but with edge families that don't
translate 1:1: phase-handoff (load-bearing for F5 multi_operator —
the signal the identity-side fingerprint-disagreement veto deliberately
isn't), shared-infra (vetoed at identity level, primary positive
signal here), temporal-overlap (pairwise-relative — F7 invariance
preserved), cohort (weak supporting weight only).

Tier weights tuned so phase-handoff alone crosses threshold (F5),
shared-infra + temporal-overlap together cross (canonical co-op
pattern), and shared-infra + cohort together do NOT (F1
shared_wordlist's failure mode). The F7 time-shift invariant is
explicitly tested on every time-bearing edge and on the combined
weight.
2026-04-26 08:57:46 -04:00
0a1cf65ddb feat(db): Campaign SQLModel + repo write/read methods
Adds the campaigns table and the BaseRepository / SQLModelRepository
methods that the campaign-clusterer worker (next commit) needs to
populate it. Mirrors the AttackerIdentity layer: schema_version from
day one for federation gossip, soft-merge via merged_into_uuid with a
chain-walking get_campaign_by_uuid, list_campaigns excluding merged-
out rows while list_all_campaigns returns the unfiltered set for the
revoke pass. attacker_identities.campaign_id gets a real FK now that
the target table exists.
2026-04-26 08:54:28 -04:00
97aa57faed feat(api): SSE stream for identity events at /api/v1/identities/events
Mirrors GET /api/v1/topologies/{id}/events: subscribes to identity.>
on the bus for the duration of the request and forwards each event as
a named SSE frame (formed / observation.linked / merged / unmerged).

The endpoint is broadly scoped (every identity event, not per-uuid)
because both AttackerDetail and IdentityDetail need the same
firehose: AttackerDetail watches for an identity.formed that finally
binds its identity_id; IdentityDetail watches for
observation.linked / merged / unmerged against its current row. A
per-uuid filter would force the client to know its identity before
subscribing, which it doesn't always.

JWT via ?token= (EventSource can't set headers), require_stream_viewer
gate, sse_connection_slot per-user cap, snapshot-on-connect with
the first 50 identities so the client buffer renders without a
separate REST call.

Bus-disabled / unreachable path keeps the connection alive on
keepalives so the client doesn't reconnect-storm; it can re-poll
the REST API on its own timer.
2026-04-26 08:36:17 -04:00
e364ef8859 feat(clustering): revocable merges (merge + unmerge)
Reworks the clusterer's tick to handle multi-identity components and
re-evaluate prior merges. Two passes per tick:

Pass 1 — per-component reconciliation:
  * Fresh component → mint identity (commit 4 path).
  * Single-identity component → link unassigned observations.
  * Multi-identity component → soft-merge: pick the smallest-uuid
    winner deterministically, set merged_into_uuid on each loser,
    link unassigned observations to the winner. Observations stay
    FK'd to their original identity row — the merge is a soft
    pointer, not a re-point. Audit trail preserved; cached
    subscribers resolve through the chain.

Pass 2 — revocable-merge undo:
  * For each merged-out identity, check whether its observations
    still cluster with its winner's. If not, the merge is
    contradicted by new evidence — clear merged_into_uuid and emit
    identities_unmerged. The resurrected identity keeps its original
    uuid, so subscribers that cached it during the merged interval
    re-attach without a new lookup.

A pre-built merge-chain dict feeds Pass 1 so the effective-identity
lookup is O(1) per observation. The chain has a hop cap (paranoia
against accidental cycles in the underlying state).

Repo additions on BaseRepository + SQLModelRepository:
  * list_all_identities() — includes merged-out rows.
  * update_identity_merged_into(uuid, winner_or_None) — single
    setter for both merge and unmerge.
DummyRepo coverage stub updated.

Tests:
  * Two distinct identities bridged by a new observation merge with
    the smaller uuid as winner.
  * A pre-seeded soft-merge whose underlying observations diverge
    gets revoked; resurrected uuid emerges with merged_into_uuid
    cleared.
  * Tick is idempotent under no state changes.
2026-04-26 08:33:32 -04:00
87412da1ca test(clustering): F6 noise-floor ratchets for production clusterer
Two targeted invariants instead of a wholesale YAML-bounds re-use,
because the existing F6 bounds were tuned for the reference
composite_signals_clusterer (fingerprint OR C2). The production
clusterer trades that aggregation for tier discipline + the
fingerprint-disagreement veto, so its score profile differs even
when its judgments are correct — multi_operator stays as 2 truth
identities, paused_campaign's two DSL actors remain a single cluster
because they share fingerprints, etc. Wholesale bounds re-use would
fight the design.

The two production-side ratchets:

1. singleton_recall ≥ 0.95 at campaign-level scoring — truth-
   singleton noise scanners must not be absorbed into real campaigns.
   This is the F6 failure mode that motivates the fixture.

2. Intra-campaign recovery under cross-corpus interference:
   * vpn_hopping's 5 rotations consolidate to one cluster.
   * shared_wordlist A and B stay in disjoint clusters despite
     sharing credentials with each other (and with the noise floor).

A future commit can revisit when the production clusterer's identity-
level truth alignment improves (e.g. when paused_campaign's DSL is
extended to mark its two actors as one truth identity).
2026-04-26 08:28:31 -04:00
7923006203 test(clustering): F7 slow-burn time-agnostic invariant
Fixture 7 ratchet: one campaign across 3 multi-week operational
windows with stable JA3 + HASSH + C2. The production clusterer must
fold all 3 into one cluster despite multi-week silence between
windows; completeness = 1.0.

Time-shift invariance test: applying a +90 day delta to every
session start (and the per-attacker first/last seen) must produce
the same cluster membership as the baseline. This is the runtime
counterpart of the static no-time-fields check on Observation. If
either check ever fails, the clusterer has accidentally grown a
recency-aware edge — fixture 7's whole reason for existing.
2026-04-26 08:26:23 -04:00
6a4592a8f5 test(clustering): low/very-low tier safety + F1/F2 ratchets
Pins down the tier-discipline contract end-to-end:

- Credentials-only overlap doesn't fuse observations (F1 in
  miniature).
- ASN-only overlap doesn't fuse observations (F2 in miniature).
- All three weak tiers (medium + low + very-low) stacked still
  don't fuse — only a high-tier signal does.
- F1 (shared_wordlist) at identity-level: no false merges, every
  row is its own predicted cluster, homogeneity = 1.0.
- F2 (vpn_hopping): 5 distinct ASNs collapse into 1 predicted
  cluster, proving JA3 / HASSH dominate ASN as the design
  requires.

The combination math itself was wired in commit 5; this commit is
the failure-mode regression suite that gates future tuning of the
tier weights.
2026-04-26 08:25:23 -04:00
ed323581fe feat(clustering): fingerprint-disagreement veto for fixture 5
Two operators cooperating on one campaign can share C2 endpoints +
stage-1 payloads while running distinct tooling — fixture 5
(multi_operator) is the canonical demonstration. The identity
clusterer must NOT fuse them: shared infra is a campaign-level
signal, not an identity-level one. The campaign clusterer (downstream
work) handles that grouping over identities.

Mechanism: when two observations have non-null fingerprints AND the
fingerprints fully disagree, the high-weight tier drops the payload
and C2 contributions to zero. JA3 / HASSH agreement still returns
1.0 directly — no veto applies when something agrees. Partial
agreement (one slot agrees, another disagrees) is treated as
agreement, since stable-tool partial overlap is more consistent
with one identity than two.

The veto only triggers when there is actual disagreement evidence —
two un-fingerprinted observations sharing a C2 still cluster, since
the absence of fingerprints is not the same as disagreement on them.

Fixture 5 production-clusterer assertion added at identity level:
ARI = 1.0, homogeneity = 1.0, exactly 2 predicted clusters from
2 truth identities. Phase-handoff edges (from the TODO) belong to
the downstream campaign clusterer, not this identity clusterer.
2026-04-26 08:24:22 -04:00
f7da33726c feat(clustering): combined edge weight + medium-tier wiring
The clusterer now drops a single high-tier function call in favor of
a tier-weighted sum. Tier multipliers (high=1.0, medium=0.6, low=0.2,
very_low=0.05) are tuned so the threshold (1.0) admits high-tier
agreement alone while leaving every weaker tier — and every
combination of weaker tiers — under threshold.

Per-tier discipline tested:
- high alone clusters
- medium alone does NOT cluster (supporting signal only)
- low alone does NOT cluster (fixture 1's failure mode)
- very-low alone does NOT cluster (fixture 2's failure mode)
- all three weak tiers stacked still don't reach threshold
- high + medium clusters (high already saturates)

The combination is forward-compatible: low + very-low contributions
are computed today but always project to 0.0 because the production
adapter doesn't populate credentials / ASN-edge inputs into the
fixture path yet. Their contribution becomes load-bearing in commit 7
when the low-tier landing tightens the F1 / F2 bounds.

Fixture 4 (paused_campaign) ratchet added: high-tier signal carries
the multi-day-silence campaign into one identity. Time-agnostic
invariant — silence is irrelevant to the edge weight.
2026-04-26 08:22:10 -04:00
de2f4c3a62 feat(clustering): wire high-weight edges end-to-end
The connected-components clusterer now writes attacker_identities
rows + sets attackers.identity_id when high-weight signals (JA3 /
HASSH / payload-hash / C2-endpoint exact match) agree across
observations. Singletons stay un-fingerprinted and un-clustered.

Algorithm split:
- cluster_observations(observations) — pure union-find over the
  high-weight edge function. Same code path for fixture validation
  and production tick.
- from_attacker_row(row) — production-row adapter; recovers JA3 +
  HASSH from Attacker.fingerprints JSON. Payload + C2 join from
  logs in later commits; the function shape doesn't change.

Repo additions on BaseRepository + SQLModelRepository:
- list_attackers_for_clustering(limit=None)
- create_attacker_identity(row)
- set_attacker_identity_id(attacker_uuid, identity_uuid)
DummyRepo coverage stub updated.

v1 behavior is conservative: only assigns identities to observations
whose identity_id is currently NULL. Multi-identity components are
skipped this pass — merge / re-assign lands in commit 10 with
revocable merges.

Fixture bounds tightened against the production clusterer:
- lone_wolf (F3) — singletons stay singletons
- shared_wordlist (F1) — credential-only overlap doesn't cluster
  (high-weight tier doesn't include credentials)
- vpn_hopping (F2, identity-level) — 5 rotated IPs with stable JA3
  + HASSH fold into one identity, ARI = 1.0, completeness = 1.0
2026-04-26 08:19:56 -04:00
a9775c4000 feat(clustering): similarity-graph primitives
Adds the four weight-tier edge functions as pure, time-agnostic
scoring primitives over an Observation projection. Each returns a
score in [0, 1]; the connected-components impl will combine + threshold
in subsequent commits.

Tier semantics (from IDENTITY_RESOLUTION.md):
- high   — JA3/HASSH/payload-hash/C2-endpoint exact match
- medium — phase-bucketed command-sequence Jaccard
- low    — credential-attempt-set Jaccard (defeated alone by F1)
- very low — ASN equality (defeated alone by F2)

Time-agnostic invariant is a static test: Observation has no time
fields, so no edge function can silently start using them. Fixture 7
forbids recency-decay clustering on multi-month APT campaigns.

A from_synthetic() adapter projects SyntheticAttacker corpora into
Observation; the production-row adapter lands when the clusterer
starts reading the attackers table.
2026-04-26 08:13:29 -04:00
fb522af107 feat(bus): reserve identity.unmerged topic
Revocable merges (a contradiction-driven undo of identity.merged) ship
in the clusterer work; this reserves the topic up-front so identity.>
subscribers receive it day one without a re-subscribe.

The clusterer worker's ClusterResult fan-out now publishes on
identity.unmerged when populated. The skeleton clusterer never
populates it; the revocable-merge commit will.

Wiki update lives in wiki-checkout/Service-Bus.md (separate repo).
2026-04-26 08:10:56 -04:00
e545f7d8d3 feat(clustering): identity clusterer worker skeleton
Adds the decnet clusterer master-only command + provider-subpackage
shape (base.py + factory.py + impl/connected_components.py) so
subsequent commits can land similarity-graph features without
churning callers.

The skeleton ConnectedComponentsClusterer.tick is a no-op; the
worker shell is fully wired (bus consumer on attacker.observed +
attacker.scored, slow-tick fallback, health heartbeat, control
listener, ClusterResult fan-out to identity.formed/observation.linked
/merged). Subscribers on identity.> see no events from this clusterer
until edge functions land, but the lifecycle is in place.
2026-04-26 08:09:11 -04:00
6b6a808a4a test(clustering): fixture 7 slow_burn + recency_decay reference
Multi-month APT campaign modeling real APT operational tempo: recon
over weeks, exploitation later, action-on-objectives later still.
The unique signal this fixture stresses is TIME-AGNOSTIC IDENTITY
across multi-week silences — a clusterer that silently expires old
edges fragments any campaign that operates over months.

Three DSL actors represent the operator's three operational windows
(week 2, month 2, month 3 of a 90-day campaign), all sharing JA3 +
HASSH + payload + C2 callback. Campaign-level fixture only — the
three actors mint distinct truth_identity_id rows by design (same
modeling caveat as fixtures 4 and 5).

The fixture's narrative mirrors how an APT works a deep nested
topology (DECNET MazeNET mode): map decoy networks for weeks, only
then commit to exploitation. Slow-and-low pacing is the signal.

recency_decay_clusterer added to fixture_harness — same edge
construction as composite_signals_clusterer, but each edge weighted
by exp(-time_distance / half_life_days) and dropped below a
threshold. Adversarial reference for slow_burn: with 14-day half-
life and 0.5 threshold, edges between operational windows (24+ days
apart) decay below threshold and drop. The campaign fragments into
three clusters; completeness collapses.

This is the canonical production failure mode for graph clusterers
that bound memory or bias toward "what's hot" by silently expiring
old edges. Catching it in synthetic data is what fixture 7 exists
for; the replay tier will surface real-world drift / dwell patterns
that calibrate the half-life threshold the real algorithm should
tolerate.

Four tests: corpus shape (window-isolated sessions, stable
fingerprint), pipeline pass via composite_signals_clusterer (time-
agnostic — folds all three windows), adversarial fragmentation
(3 clusters at 14-day half-life), long-half-life sanity (gentle
decay unions everything; confirms behavior depends on the half-life
parameter, not on something unrelated).
2026-04-26 07:58:23 -04:00
7021fda0e6 test(clustering): fixture 6 noise_floor (composite + cross-corpus)
Bundles all five prior fixtures' campaigns into one corpus alongside
10 fresh Delivery-only noise scanners (on top of lone_wolf's 8
inherited). The fixture covers cross-corpus interference — signal
collisions across fixtures' JA3/HASSH/C2 strings, factory ID re-use,
clusterer ambiguity that only manifests when multiple campaigns
score together. Each constituent fixture already ships its own
in-fixture adversarial test; this one is the control for the class
of failures that single-corpus fixtures cannot catch.

Composition is declared via a fixture-6-specific include_fixtures
block in noise_floor.yaml. The test file's loader expands it into
a full corpus.campaigns spec at runtime so the factory itself stays
unaware — no factory primitive added for what only this fixture
needs. The 8 noise scanners declared by lone_wolf flow through
naturally; the extra_noise_scanners count adds 10 more.

composite_signals_clusterer (added in the fixture-5 commit) is the
pass clusterer — union-find combining (ja3, hassh) match OR
overlapping C2 callback. Approximates the planned similarity graph
well enough that every campaign resolves and every singleton stays
singleton in the merged corpus.

Three tests: corpus integrity (every campaign id present, 12
campaign-driven attackers + 18 noise = 30 total), pipeline pass
against the global bounds, and an explicit singleton-recall
assertion (21 truth-singletons — 1 lone wolf, 18 noise, 2
shared_wordlist actors whose campaigns are size 1 — all kept
singleton by the composite clusterer). Singleton recall is the
load-bearing metric here: noise absorption is the failure mode
that makes campaign attribution useless in practice.
2026-04-26 07:49:36 -04:00
27f7de9886 test(clustering): fixture 5 multi_operator + c2/shift/composite refs
Three new reference clusterers in fixture_harness:

* c2_callback_clusterer — union-find on overlapping C2 callback
  sets across an attacker's sessions. Pass-clusterer for fixture 5
  where two operators with distinct tooling share a C2 endpoint as
  the campaign signal.

* shift_clusterer — deliberately-bad reference that buckets
  attackers by majority session-start hour into night/day/swing.
  Adversarial reference for fixture 5; proves operational schedule
  is NOT a campaign signal.

* composite_signals_clusterer — union-find combining (ja3, hassh)
  match OR overlapping C2 callback. Will serve as the pass-
  clusterer for fixture 6 (noise_floor) where multiple campaigns
  with heterogeneous signal types are scored together.

Also factored a small _union_find helper for the new clusterers
(existing time_window/credential_jaccard left untouched to avoid
mixing refactor with feature work).

Fixture 5 (multi_operator): one campaign, two operators with
distinct UKC roles. Actor A (broker, night shift): Delivery →
Exploitation → Persistence → C2. Actor B (post-ex, day shift):
Discovery → Lateral Movement → Collection → Exfiltration.
Distinct JA3/HASSH/ASN/IPs; shared C2 + payload hash.

Four tests: corpus shape (distinct fingerprints, shared C2,
disjoint shifts), pipeline pass via c2_callback_clusterer,
explicit harness sanity that fingerprint_clusterer cannot resolve
this fixture (documents which signal carries the campaign), and
adversarial shift_clusterer fragmentation.

Phase-handoff edges (the real load-bearing signal per the design
doc) wait for the production clusterer; this fixture will prove
they're needed when it ships.
2026-04-26 07:46:14 -04:00
304592abfe test(clustering): fixture 4 paused_campaign + active_days/time_window
Adds the actor.active_days primitive to the campaign factory so a
DSL actor can be bound to specific day indexes. Falls back to the
non-paused day pool when absent (existing fixtures unchanged).
Intersects with pause_windows so the campaign-wide silence still
wins if both are set.

Adds time_window_clusterer reference to fixture_harness — union-find
over attackers, edge if their session time-ranges are within
gap_days of each other. Deliberately-bad reference for fixture 4:
multi-day silent stretches fragment a single campaign because the
clusterer has no signal that bridges the gap.

Fixture 4 (paused_campaign): one campaign modeled as two DSL actors
representing the operator's two operational windows (active days
1-2 and 6-7), separated by a silent stretch (days 3-5). Both share
JA3 + HASSH + payload + C2 callback; only their active_days differ.

Five tests: corpus shape (rows in their windows, shared signals),
pipeline pass via fingerprint_clusterer at level=campaign,
adversarial fragmentation via time_window_clusterer (1-day union
threshold cannot bridge the 4-day silence → completeness collapses),
huge-gap sanity (gap_days=10 unions both halves), silent-stretch
invariant (no session leaks into the configured pause window).

Identity-level scoring is fixture 2's job; this fixture is
campaign-level only — modeling caveat documented in the YAML.
2026-04-26 07:39:46 -04:00
0def6f7e37 test(clustering): fixture 2 vpn_hopping + fingerprint/asn references
One campaign, one DSL actor, ip_pool: rotating + rotation_count: 5
across 5 synthetic private-use ASNs (RFC 6996 64512-64516). Stable
JA3, HASSH, and payload_hash across every rotation — these are the
"signals the attacker can't cheaply rotate" per IDENTITY_RESOLUTION.md
and the load-bearing reason all 5 observation rows must resolve to
one identity / one campaign.

Two new reference clusterers in fixture_harness.py:

* fingerprint_clusterer — groups by (ja3, hassh). Un-fingerprinted
  rows stay singleton so it doesn't trivially fuse all noise into one
  mega-cluster. Approximates the stable-signal arm of the planned
  similarity graph.

* asn_clusterer — deliberately-bad reference for fixture 2's
  adversarial test. Group-by-ASN shatters the campaign into 5
  singletons; completeness collapses to 0.

Four tests in test_vpn_hopping_fixture.py: corpus shape (5 rows, 1
identity, 1 campaign, 5 distinct ASNs/IPs, stable fingerprints),
pass at campaign level, pass at identity level (asserts ARI exactly
1.0), asn_clusterer breaches the completeness floor.
2026-04-26 07:34:18 -04:00
f6b83755eb test(clustering): factory honors ip_pool: rotating + 3-level truth labels
Fifth and final commit of the identity-resolution substrate. Unblocks
fixture 2 (vpn_hopping) by making the synthetic factory match
production shape: an actor rotating across N IPs produces N
SyntheticAttacker rows that share fingerprints + truth_identity_id but
differ on ip / asn — exactly the shape the future clusterer needs to
recover via JA3/HASSH match.

Factory:
* SyntheticSession + SyntheticAttacker gain truth_identity_id field.
* DSL: ip_pool: rotating + rotation_count: N produces N observation
  rows per actor. Optional rotation_asns: [...] cycles ASN per row;
  defaults to the actor's primary asn.
* Sessions distribute round-robin across the actor's rotated rows.
* Noise scanners get truth_identity_id == truth_actor_id ==
  truth_campaign_id (each is its own singleton at every level).
* GeneratedCorpus.truth_labels(level=) accepts "campaign" (default,
  back-compat), "identity", or "actor" — picks the oracle the
  metric harness scores against.

Harness:
* assert_fixture_bounds gains truth_level kwarg (default "campaign")
  so identity-resolution fixtures can score against truth_identity_id
  without churning the campaign-clustering test files.

Tests: 9 new (rotation_count emits N rows, shared identity +
fingerprints, distinct IPs, rotation_asns distribution + cycling,
round-robin session distribution, identity-level truth labels,
sticky default unchanged, sessions inherit identity label).
598 tests green across clustering / factories / db / web / bus /
profiler / correlation.
2026-04-26 07:19:39 -04:00
4f1077be72 feat(bus): identity.* topic family (formed / observation.linked / merged)
Fourth of the five-step identity-resolution substrate. Constants and
builder ship now; no publishers exist yet — they land with the
clusterer worker. Subscribers (webhook worker, dashboard SSE relay)
can register against identity.> from day one.

* decnet/bus/topics.py — IDENTITY root + IDENTITY_FORMED /
  IDENTITY_OBSERVATION_LINKED / IDENTITY_MERGED leaves; identity()
  builder mirroring the attacker() / system() helpers. Module
  docstring topic-tree updated.
* tests/bus/test_topics.py — assert builder produces the expected
  three topic strings + rejects empty event_type.

Wiki Service-Bus.md and a new Identity-Resolution.md page land in the
companion wiki-checkout commit.
2026-04-26 07:15:44 -04:00
dc3d08dd41 feat(web): read-only /api/v1/identities/* endpoints + repo methods
Second of the five-step identity-resolution substrate. Ships the API
surface against the empty AttackerIdentity table from commit 1 — every
endpoint returns empty/404 cleanly until the clusterer populates rows.

Routes (auth-gated, viewer role):
* GET /api/v1/identities — paginated list, excludes merged-out rows
* GET /api/v1/identities/{uuid} — detail; transparently follows
  merged_into_uuid to surface the canonical winner
* GET /api/v1/identities/{uuid}/observations — Attacker rows FK'd
  to the (resolved) identity uuid

Repository (BaseRepository abstract + SQLModelRepository concrete):
* get_identity_by_uuid (with merge-chain following, hop-bounded)
* list_identities / count_identities (excluding merged-out)
* list_observations_for_identity / count_observations_for_identity

Tests: 12 new (empty-table behavior, seeded data, merge-chain
resolution, repo-level smoke against real SQLite). Also fixes the
pre-existing test_base_repo_coverage failure (DEBT-041 added abstract
methods without updating the DummyRepo stub) — included here because
this PR adds 5 more abstract methods, fixing it as a bonus.

474 db/web/profiler/correlation tests green.
2026-04-26 07:08:55 -04:00
84c1ca9c9b feat(identity): AttackerIdentity table + nullable attackers.identity_id FK
Schema-only commit, first of the five-step substrate for identity
resolution. The clusterer that populates identities lands later; this
ships the table empty and the FK uniformly NULL on existing rows.

* decnet/web/db/models/attackers.py — new AttackerIdentity SQLModel
  (uuid PK, schema_version, fingerprint summary lists, kd_digraph_simhash,
  merged_into_uuid self-FK, all clusterer-populated fields nullable).
  Attacker grows a nullable indexed identity_id FK + docstring marking
  it as the per-IP observation row.
* decnet/web/db/models/__init__.py — re-exports AttackerIdentity.
* tests/db/test_identity_schema.py — 9 schema invariants: table exists,
  identity_id nullable + indexed, FK targets attacker_identities.uuid,
  schema_version defaults to 1, attacker rows inserted with NULL
  identity_id, FK constraint blocks orphans.

463 unrelated db/web/profiler/correlation tests still green. See
development/IDENTITY_RESOLUTION.md for the full design.
2026-04-26 07:00:24 -04:00
e80f3eec54 test(clustering): fixture 1 (shared_wordlist) + fixture-harness extraction
Two campaigns sharing a credential wordlist; everything else (ASN, IPs,
JA3, HASSH, active hours) divergent. Pass condition: clusterer must NOT
merge. Protects against the "credential overlap is identity" failure
mode that commodity wordlists invite.

* tests/clustering/fixture_harness.py — shared assert_fixture_bounds
  helper + identity_clusterer (placeholder, trivially correct on
  all-singleton fixtures) + credential_jaccard_clusterer (deliberately-
  bad reference used to PROVE the fixture catches what it should).
* tests/clustering/test_shared_wordlist_fixture.py — bounds pass with
  identity, bounds FAIL (homogeneity → 0) with the bad credential
  clusterer. The latter is the proof the fixture earns its keep.
* tests/fixtures/campaigns/shared_wordlist.{yaml,expected.yaml}.
* tests/clustering/test_lone_wolf_fixture.py — refactored onto the
  shared harness. No behavior change.
2026-04-26 06:38:17 -04:00
00254629f8 feat(clustering): UKC phase enum + synthetic campaign factory + metric harness
Pre-implementation scaffolding for campaign clustering. The simulator is
the spec — algorithm code follows once fixtures + metrics are stable.

* decnet/clustering/ukc.py — UKCPhase enum (19 phases across In/Through/Out
  stages), OBSERVABLE_PHASES set, stage_of() helper. Vocabulary aligns
  with future MITRE ATT&CK tagging so synthetic data and runtime phase
  inference don't need renaming when TTP-tagging lands.
* tests/factories/campaign_factory.py — YAML DSL parser + deterministic
  generator emitting truth-labeled SyntheticAttacker / SyntheticSession
  records. Validates phase names, warns on unobservable phases, supports
  multi-campaign + noise corpora.
* tests/clustering/metrics.py — pure-Python ARI / homogeneity /
  completeness / singleton_recall (no sklearn dep). Decided before any
  algorithm exists, on purpose.
* tests/fixtures/campaigns/lone_wolf.{yaml,expected.yaml} — fixture 3
  from the design doc; simplest of the six, exercises the full pipeline
  with an identity-clusterer placeholder.
* development/CAMPAIGN_CLUSTERING.md — design spec for the feature.
* development/DEVELOPMENT_V2.md — note on DSL evolution path
  (concurrent phases, multi-actor per phase) deferred post-v1.
2026-04-26 06:29:10 -04:00
3eb67c9400 refactor(intel): re-key attacker_intel on attacker_uuid (closes DEBT-041)
The threat-intel surface was IP-keyed on day one as an expedient — the
worker is woken by IP-bearing bus events. ANTI's call: don't carry that
debt. NO IPs as primary keys anywhere on the attacker-intel surface.

Schema:
- attacker_uuid is now the canonical key — UNIQUE + FK to attackers.uuid.
- attacker_ip stays as a denormalised, indexed, NON-UNIQUE value column.
  Updated on every upsert; useful for SIEM payloads and audit lookups,
  but explicitly NOT a key. Model docstring says so.
- Pre-v1, no Alembic migration needed. SQLModel.metadata.create_all()
  builds the new shape on fresh DBs.

Repo:
- upsert_attacker_intel now keys on attacker_uuid.
- get_attacker_intel_by_ip → get_attacker_intel_by_uuid.
- get_unenriched_attacker_ips → get_unenriched_attackers, returning
  [{uuid, ip}] tuples so the worker writes by UUID and dispatches
  provider calls by IP without a second round-trip.

Worker:
- _enrich_one(uuid, ip, ...) — UUID lands on the row, IP rides for
  provider egress.
- attacker.intel.enriched bus payload gains attacker_uuid alongside
  attacker_ip — webhook → SIEM consumers benefit; no removal.

API:
- GET /api/v1/attackers/{ip}/intel deleted outright (rip-and-replace,
  never deployed beyond dev).
- GET /api/v1/attackers/{uuid}/intel is the only public route, matching
  every other /attackers/* route.

Frontend:
- <IntelPanel uuid={id!} /> uses the URL param directly, fetches in
  parallel with the rest of AttackerDetail rather than waiting on
  attacker.ip.

Tests: re-keyed in place, 39 passed (same coverage as before the
refactor). Provider-impl tests untouched.

DEBT-041: closed in DEBT.md (entry preserved as historical rationale,
summary table flipped to , remaining-open list shortened by one).
2026-04-26 05:35:29 -04:00
d3d9bd5aa7 feat(intel): decnet enrich CLI + GET /attackers/{ip}/intel endpoint
CLI command mirrors the reuse-correlate shape (--poll-interval, --ttl-hours,
--daemon). Run it under systemd as a sibling worker.

The API endpoint returns the most recent cached row for an attacker IP
or 404. Auth-gated via require_viewer like every other attacker route.

Also extends the worker test with a real FakeBus so the
attacker.intel.enriched publish path is exercised end-to-end (no longer
a no-op against NullBus).
2026-04-26 05:17:25 -04:00
cd70136d09 feat(intel): wire GreyNoise, AbuseIPDB, Feodo Tracker + ThreatFox
Four concrete IntelProvider impls — three per-IP queries plus one bulk
feed:

* GreyNoiseProvider — community endpoint, optional API key for higher
  rate limit. 404 = unknown (cache the absence so we don't re-query).
* AbuseIPDBProvider — score threshold mapping (>=75 malicious, >=25
  suspicious, else benign). Self-disables with a clear error when no
  API key is configured rather than burning quota.
* FeodoProvider — fetches the bulk botnet C2 IP feed once per refresh
  window and answers every lookup from an in-memory set. Listed = C2.
* ThreatFoxProvider — POST /api/v1/ search_ioc query, optional Auth-Key
  header. Match in data[] = malicious; no_result = absence-not-benign.

Every provider routes through decnet.net.http.stealth_client so the
egress UA never leaks 'DECNET'.
2026-04-26 05:15:17 -04:00
f49a7db07d feat(intel): worker shell + attacker.intel.enriched bus topic
run_intel_loop fans out across configured providers per IP, writes the
aggregate row, and publishes attacker.intel.enriched. Mirrors the
correlation/reuse_worker.py wake-on pattern: subscribes to
attacker.observed and attacker.scored for sub-second latency, falls back
to a 60s poll when the bus is unavailable. Heartbeat + control-listener
wired so the workers panel sees it like every other supervised worker.

Aggregate verdict picks the strongest provider tier (malicious >
suspicious > benign > unknown). Provider-level errors land in
IntelResult.error and are logged without poisoning the row — partial
success is the expected case for free-tier providers under their daily
caps.

Concrete provider impls land in follow-up commits; the worker is fully
exercised here against fake providers so the framing is locked in.
2026-04-26 05:01:47 -04:00
58ca9075db feat(net): stealth-egress httpx client factory
Outbound calls to 3rd-party services (threat-intel providers, future TI
lookups) MUST NOT advertise 'DECNET' in their user-agent — operators
running honeypots want their reconnaissance dependencies to look like
generic infra. New decnet.net.http.stealth_client() returns a fresh
httpx.AsyncClient with a curl-shaped UA (pinned to a single constant so
future siblings — browser-shaped, Go-shaped — sit next to it cleanly).

Internal egress (webhook → operator's own SIEM, swarm worker → master)
keeps its DECNET-tagged UA; the docstring is explicit about not routing
those through this client.
2026-04-26 04:59:34 -04:00
023bc1993d feat(intel): provider ABC + lazy factory
IntelProvider is async-first (every concrete provider does HTTP), bounded
by a per-provider asyncio.Semaphore, and contractually never raises —
errors land in IntelResult.error so a single provider's outage doesn't
poison the worker pass for an entire IP.

Factory returns a list (not a singleton like geoip) because intel
enrichment fans out across all enabled providers per IP, with row-level
partial-success handling. Lazy imports keep the module dependency-free
when intel is disabled.

Concrete providers (greynoise/abuseipdb/feodo/threatfox) land in
follow-up commits — factory references them via lazy import so tests
covering the disabled and unknown-name paths pass on their own.
2026-04-26 04:58:38 -04:00
0dd3811436 feat(intel): attacker_intel table + repo helpers
New TTL-cached threat-intel row keyed by attacker IP, with per-provider
verdict/raw/queried_at columns for GreyNoise, AbuseIPDB, abuse.ch Feodo
Tracker and ThreatFox. Carries schema_version from day one (federation
wire-format precedent set by SessionProfile). Repo gains
upsert_attacker_intel, get_attacker_intel_by_ip, and a
get_unenriched_attacker_ips backfill primitive that picks fresh + stale
rows for the forthcoming 'decnet enrich' worker.

Also documents the open-source intel-source backlog in DEVELOPMENT_V2.
2026-04-26 04:56:47 -04:00
50870f2e7a feat(creds): surface plaintext/b64 secret on reuse findings
The CredentialReuse table only stores the sha256+kind hash of the
secret; the printable + b64 forms live on the underlying Credential
rows. The dashboard drawer was therefore showing only the hash, which
defeats most of the value of having a reuse view in the first place.

Repo helpers list_credential_reuses + get_credential_reuse_by_id now
issue one batched SELECT against credentials keyed on the sha256s in
the result page and graft secret_printable + secret_b64 onto each row
before returning. The drawer renders the same printable/b64 code-block
the credentials inspector uses.
2026-04-26 04:34:19 -04:00
0d2283e10c chore(cli): remove dead decnet correlate command
The CLI was a day-one debug helper that read a log file or stdin and
printed a traversal table. It hadn't been wired to the live data path
since the engine moved into the profiler worker (DEBT.md:218). No
deploy unit, no caller, no doc relied on it. Removed the command and
its two tests; `decnet/correlation/` stays as a library consumed by
the profiler and the reuse correlator.
2026-04-26 04:26:15 -04:00
181c792753 feat(api): GET /credential-reuse list + detail endpoints
Read-only routes for the credential-reuse findings produced by the
correlator. Mirrors the /credentials route shape: JWT-gated via
require_viewer, paginated with optional secret_kind /
min_target_count filters, and a 404-on-missing detail route.

No POST/PUT/PATCH (and no body parsing) so no 400 contract is
documented.
2026-04-26 03:40:08 -04:00
590c2b0fac feat(correlation): credential-reuse engine + reuse-correlate worker
Adds CorrelationEngine.correlate_credential_reuse + the
`decnet reuse-correlate` long-running worker. The worker mirrors the
mutator's bus-wake + slow-tick pattern: wakes on credential.captured
and attacker.observed for sub-second latency, falls back to a 60s
poll if the bus is unavailable, and publishes
credential.reuse.detected once per new or grown CredentialReuse row
(group-deduped so a 5-cred reuse doesn't emit 5 partial events).

The web ingester now publishes credential.captured after every
successful Credential upsert; bus + new repo helper
find_credential_reuse_candidates feed the engine pass.
2026-04-26 03:37:49 -04:00
00ecea924a feat(profiler): backfill Credential.attacker_uuid on attacker upsert
Credential capture runs before the profiler mints an Attacker, so
Credential.attacker_uuid is nullable on write. The profiler now
backfills the FK after each successful upsert_attacker. Soft-fail
posture matches the surrounding behavior + smtp rollups so a backfill
error never blocks the next attacker.
2026-04-26 03:30:44 -04:00
ce4be68501 feat(creds): cred-reuse foundation + vectorstore scaffold
Lays the storage and bus substrate for the "credential reuse patterns"
task in DEVELOPMENT.md and scaffolds decnet/vectorstore/ as the future
substrate for statistical attacker re-identification over behavioral
fingerprints. No correlator, profiler, API, or dashboard wiring in
this commit — see TODO.md for the handoff.

Schema:
  - Credential.attacker_uuid (nullable FK to attackers.uuid),
    backfilled by the profiler post-write to avoid coupling the
    capture path to the profiler's ordering.
  - CredentialReuse table — UUID PK, JSON list columns for the
    accumulating attacker_uuids/ips/deckies/services, target_count
    (the discriminative scalar), confidence reserved for a future
    fuzzy-credential pass.

Repo:
  - upsert_credential_reuse / list_credential_reuses /
    get_credential_reuse_by_id / update_credential_attacker_uuid.
  - Renamed pre-existing get_credential_reuse(secret_sha256) to
    get_credential_attempts_for_secret(secret_sha256) — the new
    findings table needs the cleaner name.

Bus topics:
  - credential.captured (one per Credential upsert)
  - credential.reuse.detected (correlator-emitted on insert/grow)

Vectorstore subpackage (decnet/vectorstore/, flat layout mirroring
decnet/bus/):
  - BaseVectorStore ABC keyed by (kind, id) — kind discriminator
    means new feature families are additive, no schema migration.
  - FakeVectorStore (in-memory L2 KNN), NullVectorStore (no-op for
    DECNET_VECTORSTORE_ENABLED=false), SqliteVecVectorStore (lazy
    sqlite_vec extension load, one vec0 virtual table per kind).
  - get_vectorstore() env-driven dispatch with graceful fallback
    to FakeVectorStore when the sqlite-vec extension isn't on the
    host, so workers don't crash on a missing optional dep.

Tests: 26 new (11 cred-reuse repo, 15 vectorstore). Existing
credentials and base-repo tests updated for the rename. Total: 34
passing on the touched files.
2026-04-26 03:18:34 -04:00
817ce32e6d fix(collector): label-based fleet container discovery
The events watcher's start-event filter previously called
_load_service_container_names(), which reads decnet-state.json on
every event. decnet deploy writes that state file out-of-band
with docker compose up, so a container's start event could
arrive before the state was committed — the watcher then dropped
the event silently and never tailed the container's stdout. The
visible symptom was an empty Credentials view (and Logs/Bounty)
after a fresh deploy until the collector was manually restarted.

Fix: stamp decnet.fleet.{service,decky,service_name} labels on
every fleet service container at compose-time, and let the
collector recognize either the fleet or topology label without
touching the state file. The state-file name match remains as a
fallback for legacy containers that predate the new labels.
2026-04-25 08:11:21 -04:00
4566146d50 feat(api): GET /credentials endpoint
Surfaces the Credential table (deduped attacker auth attempts) via
a new /api/v1/credentials route. Mirrors the Bounty cache pattern
(5s TTL on the unfiltered default page) and reuses the existing
get_credentials / get_total_credentials repo methods + the already
defined CredentialsResponse DTO. Filters: search, service, attacker_ip.
2026-04-25 07:51:20 -04:00
b3d1301925 feat(creds): DEBT-040 Phase 3 — RDP NLA / CredSSP NTLMv2 capture
When RDP_ENABLE_NLA=true (service_cfg.nla=true on the topology side),
confirm PROTOCOL_HYBRID on the X.224 Connection Confirm, upgrade the
socket to TLS using a self-signed cert generated at first start by
the entrypoint, then drive a tiny CredSSP loop:

- Read inbound TSRequest DER (bounded to MAX_TSREQUEST_LEN).
- Scan for the NTLMSSP signature, dispatch on message type:
  Type 1 -> respond with a hand-built TSRequest carrying our Type 2
  challenge. Type 3 -> parse_type3() and emit auth_attempt with the
  universal credential SD shape (secret_kind = ntlmssp_v2).
- Hand-built DER: no pyasn1 dependency.

Also folds in a small fix-up to commit 1: SMB SERVER_CHALLENGE was
hardcoded to 0x11..0x88 across the fleet, which would let a scanner
fingerprint every DECNET decky by its NTLM challenge. Both SMB and
RDP now derive the 8-byte challenge from
instance_seed.random_bytes(8, "ntlm_challenge"), giving each decky a
deterministic-but-distinct value. SMB Dockerfile gets the
instance_seed.py copy too (was synced into the build context but not
COPYed into the image).

- decnet/services/rdp.py: optional service_cfg.nla bool flips
  RDP_ENABLE_NLA in the compose env.
- decnet/templates/rdp/Dockerfile + entrypoint.sh: openssl install +
  per-decky cert generation gated on RDP_ENABLE_NLA.
- 9 NLA unit tests cover the DER reader/builder, _handle_nla round-
  trip with Type 1 / Type 3, oversized-DER rejection, and per-
  NODE_NAME challenge divergence.
- DEBT.md: DEBT-040 closed; full TS_INFO_PACKET capture documented as
  a follow-up if attacker telemetry justifies it.
2026-04-25 07:42:52 -04:00
a8b9c82c97 feat(creds): DEBT-040 Phase 2 — RDP X.224 cookie capture
Replace Twisted-based connection logger with an asyncio handler that
parses the X.224 Connection Request, extracts the mstshash routing
cookie (universal across mstsc / FreeRDP / Hydra / ncrack / MSF
rdp_login), records the rdpNegRequest.requestedProtocols flags, and
answers with a well-formed X.224 Connection Confirm selecting
PROTOCOL_RDP.

Scope-down vs. the original DEBT-040 plan: full TS_INFO_PACKET
extraction would require either Standard-RDP-Security RC4 stream-
cipher implementation (with our own RSA pair + MS-RDPBCGR signing) or
a complete MCS+GCC ASN.1/BER stack for the SSL path — both far
exceed the 150 LoC budget the DEBT cited. The mstshash cookie is the
only piece of credential information that flows in plaintext on the
wire when the attacker speaks RDP, so capturing it is the highest-
value-per-byte signal available without going down either rabbit
hole. Phase 3 (CredSSP/NLA, next commit) is where actual NTLMv2
hashes land.

- Drops Twisted dependency from rdp/Dockerfile; adds ntlmssp.py copy
  ahead of the NLA path that consumes it.
- 7 unit tests cover cookie capture, requestedProtocols recording,
  CC framing, no-cookie path, and oversized/non-TPKT drops.
2026-04-25 07:34:42 -04:00
6905c88083 feat(creds): DEBT-040 Phase 1 — SMB NTLMSSP framer
Replace impacket's SimpleSMBServer with a hand-rolled asyncio SMB2
framer that walks Negotiate -> SessionSetup(Type1) -> SessionSetup(Type3)
just deep enough to extract the inner NTLMSSP Type 3 via the shared
parse_type3() parser. Always returns STATUS_LOGON_FAILURE; the
attacker's hash lands in the Credential table, the attacker doesn't
land on the host.

- decnet/engine/deployer.py: _sync_ntlmssp_sources() mirrors the
  auth-helper / sessrec sync pattern, copies _shared/ntlmssp.py into
  smb/ and rdp/ build contexts before docker compose up.
- Dockerfile: drop impacket dep, copy ntlmssp.py.
- 7 unit tests drive the asyncio handler in-process via
  StreamReader.feed_data; assert dialect, MORE_PROCESSING_REQUIRED on
  first SessionSetup, NTLMSSP Type 2 carriage in SPNEGO, credential
  capture with universal SD shape, STATUS_LOGON_FAILURE on Type 3,
  oversized-NBSS / SMB1 / short-PDU drops.
2026-04-25 07:31:41 -04:00
afe02af5c2 feat(creds): NTLMSSP Type 3 parser + DEBT-040 for SMB/RDP/NLA framers
Ships the load-bearing primitive both Phase 5 (SMB) and Phase 7
(RDP NLA) need: a standalone NTLMSSP Type 3 (AUTHENTICATE_MESSAGE)
parser per MS-NLMP §2.2.1.3.

Surface:
  parse_type3(blob) -> dict | None
  find_ntlmssp(buf) -> int   # locate NTLMSSP\\0 inside SPNEGO outer

Returns the universal Credential SD shape:
  username + domain (decoded UTF-16-LE or ASCII per NEGOTIATE_UNICODE)
  principal = "DOMAIN\\\\username"
  secret_kind = "ntlmssp_v1" (24-byte fixed) or "ntlmssp_v2" (variable)
  secret_b64 = base64 of NtChallengeResponse — canonical hashcat input
               (-m 5500 v1, -m 5600 v2)

Bounds-checked for untrusted-input safety. Anonymous binds (empty NT
response) return None — no credential to record.

7 unit tests cover NTLMv1/v2 distinction, ASCII vs Unicode strings,
empty-domain shape, malformed signature/type rejection, and SPNEGO-
wrapped find_ntlmssp() lookup.

DEBT-040 opens to track the three remaining protocol framers that
will consume this parser:
  - SMB: hand-rolled SMB2 + Session Setup framer (~200 LoC) replacing
    Impacket's opaque SimpleSMBServer
  - RDP basic auth: TPKT/X.224/MCS framer for legacy plaintext path
    (~150 LoC)
  - RDP NLA: TLS upgrade + CredSSP TSRequest parser, reuses parse_type3
    via the SPNEGO inner blob (~250 LoC)

These are substantial protocol implementations each — landing them
inline with Phase 1-3+6's cred coverage rollout would have inflated
the session beyond reasonable scope. Cred-reuse analytics already work
across the 12 services covered in this session; the deferred three
just round out the fleet.
2026-04-25 07:19:30 -04:00
9777aa7677 feat(creds): Phase 6 — MongoDB SCRAM credential capture
Plugs the cred-coverage gap for MongoDB. The template previously
parsed only the wire opcode + length and discarded the BSON body
entirely, so SCRAM-SHA-{1,256} client-proofs flowed straight through
without ever landing in the Credential table.

Adds an inline minimal BSON walker (~100 LoC) covering the 7 type
codes auth commands actually use: string, doc, array, binary, bool,
int32, int64. Hand-rolled rather than pulling pymongo as a runtime
dep — the parser is bounds-checked for untrusted-input safety
(won't loop on malformed length fields).

Wire flow MongoDB clients use for auth:
- OP_MSG body section (kind=0) → BSON doc with `saslStart` field
  carrying mechanism + payload (SCRAM client-first-message:
  "n,,n=<user>,r=<nonce>"). Username extracted, pinned to the
  per-connection _sasl_username + _sasl_mechanism state.
- Subsequent OP_MSG with `saslContinue` → SCRAM client-final-message
  ("c=biws,r=<combined>,p=<base64 client-proof>"). The `p=` value is
  the credential — emitted as secret_kind=scram_sha256 (or _sha1 /
  _unknown depending on the prior saslStart's mechanism), principal
  = the pinned username, secret_b64 = base64 of the decoded proof.

Reuse semantics: same client-proof across two auth attempts only
matches when both server salt and password were identical (proofs
include the salt). So cross-session reuse correlates only on
credential reuse against the same MongoDB account on the same decky
— honest, non-misleading signal.

680 tests pass across services, service_testing, db, web/ingester,
and core/fingerprinting (the broader scope my recent commits
touched). Phases 4, 5, 7 still pending (RDP basic-auth, SMB
NTLMSSP, RDP NLA).
2026-04-25 07:15:44 -04:00
e4bf8fa012 feat(creds): Phase 3 — HTTP/HTTPS POST form body cred extraction
Login forms (wp-login.php, phpMyAdmin, Joomla, etc.) ship a
`Content-Type: application/x-www-form-urlencoded` body with field
names like username/user/email/log/pwd/password. The HTTP/HTTPS
templates already captured the body as opaque bytes; now they parse
common login-form shapes into the universal credential SD shape.

Adds canonical templates/syslog_bridge.py:
extract_form_credentials(body, content_type) -> dict | None.

Field-name matching is case-insensitive and covers:
  Principal: username, user, email, login, userid, account, log,
             user_login (WordPress), uname / pma_username (phpMyAdmin)
  Secret:    password, pass, pwd, passwd, passwort, mot_de_passe,
             user_password (WordPress), pma_password (phpMyAdmin)

The HTTP/HTTPS log_request handlers now call:
  cred = classify_authorization(...) or extract_form_credentials(...)
— Authorization wins when present (current session credential beats
a follow-up form change), but POSTs to /wp-login.php with no Auth
header still surface their cleartext creds.

Secret-without-principal is intentional: a reset-confirm or auto-
fill abuse may carry a password without any field that maps to our
principal list. The cred row writes with principal=None — the
sha256 still correlates across services for reuse analytics.

The body capture cap bumped from 512 → 4096 chars so reasonable
form bodies aren't truncated before the cred extractor sees them;
the body stored in fields.body stays at 512 chars (display-friendly).

36 helper + emitter tests pass. Phases 4-7 still pending.
2026-04-25 07:10:05 -04:00
0c1316f74c feat(creds): Phase 2 — MySQL handshake hash + MSSQL Login7 plaintext
Closes the cred-coverage gap for two database services that had been
capturing only the username:

- MySQL — extends _handle_packet to read the auth-response after the
  null-terminated username. mysql_native_password puts a 1-byte
  length followed by 20 bytes: SHA1(password) XOR SHA1(salt +
  SHA1(SHA1(password))). Plaintext irrecoverable, lands as
  secret_kind="mysql_native_password" with the 20 hash bytes in
  secret_b64. Hash is canonical for "hashcat -m 11200" if an operator
  ever wants to crack offline.

- MSSQL — fixes a pre-existing bug AND adds password capture. The
  prior _parse_login7_username read offsets 36/38, which is actually
  ibHostName/cchHostName in the Login7 layout — username sat at
  40/42 and was never touched. Replaced with _parse_login7_creds()
  reading the correct offsets (40 username, 44 password). Login7
  password is XOR-then-nibble-swap obfuscated against 0xa5;
  _deobfuscate_login7_password reverses it. Plaintext-recoverable,
  lands as secret_kind="plaintext".

The pre-existing test_login7_auth_logged_and_closes only verified the
error response ships and the connection closes; it didn't validate
the parsed username, so the hostname-as-username bug was silent. New
tests cover both the deobfuscation algorithm directly and the full
ingester round-trip for both services.

Sync: copies the canonical syslog_bridge.py into mysql/ and mssql/
template build contexts so service_testing tests load the version
with classify_authorization + encode_secret available.

37 tests pass in the touched scope. Phases 3-7 still pending.
2026-04-25 07:07:33 -04:00