Third of the five-step identity-resolution substrate. Frontend hooks
into the empty /api/v1/identities/* surface from commit 2; renders
nothing visible when identity_id is null (which is the universal state
until the clusterer ships).
* decnet_web/src/components/IdentityDetail.tsx — new page. Header with
uuid + optional CAMPAIGN / MERGED-INTO badges, stats row
(observations / JA3 / HASSH / payloads / C2), fingerprint tag lists
parsed from the JSON-in-TEXT columns, observations table that links
back to AttackerDetail, conditional analyst-notes panel.
* decnet_web/src/components/AttackerDetail.tsx — IDENTITY badge
inserted in the header row alongside TRAVERSAL. Clicking navigates
to /identities/<uuid>. AttackerData interface gains the optional
identity_id field.
* decnet_web/src/App.tsx — /identities/:id route + lazy-loaded chunk.
Verified by `tsc --noEmit` (clean) and `vite build` (clean — produces
IdentityDetail-*.js as its own lazy chunk). The repo has no JS test
harness; build + type-check are the gate.
Second of the five-step identity-resolution substrate. Ships the API
surface against the empty AttackerIdentity table from commit 1 — every
endpoint returns empty/404 cleanly until the clusterer populates rows.
Routes (auth-gated, viewer role):
* GET /api/v1/identities — paginated list, excludes merged-out rows
* GET /api/v1/identities/{uuid} — detail; transparently follows
merged_into_uuid to surface the canonical winner
* GET /api/v1/identities/{uuid}/observations — Attacker rows FK'd
to the (resolved) identity uuid
Repository (BaseRepository abstract + SQLModelRepository concrete):
* get_identity_by_uuid (with merge-chain following, hop-bounded)
* list_identities / count_identities (excluding merged-out)
* list_observations_for_identity / count_observations_for_identity
Tests: 12 new (empty-table behavior, seeded data, merge-chain
resolution, repo-level smoke against real SQLite). Also fixes the
pre-existing test_base_repo_coverage failure (DEBT-041 added abstract
methods without updating the DummyRepo stub) — included here because
this PR adds 5 more abstract methods, fixing it as a bonus.
474 db/web/profiler/correlation tests green.
Schema-only commit, first of the five-step substrate for identity
resolution. The clusterer that populates identities lands later; this
ships the table empty and the FK uniformly NULL on existing rows.
* decnet/web/db/models/attackers.py — new AttackerIdentity SQLModel
(uuid PK, schema_version, fingerprint summary lists, kd_digraph_simhash,
merged_into_uuid self-FK, all clusterer-populated fields nullable).
Attacker grows a nullable indexed identity_id FK + docstring marking
it as the per-IP observation row.
* decnet/web/db/models/__init__.py — re-exports AttackerIdentity.
* tests/db/test_identity_schema.py — 9 schema invariants: table exists,
identity_id nullable + indexed, FK targets attacker_identities.uuid,
schema_version defaults to 1, attacker rows inserted with NULL
identity_id, FK constraint blocks orphans.
463 unrelated db/web/profiler/correlation tests still green. See
development/IDENTITY_RESOLUTION.md for the full design.
Pre-implementation design for the observation/identity/campaign
three-level hierarchy. Sibling-add approach (not rename) — keep the
attackers table name, add attacker_identities as a sibling, nullable
attackers.identity_id FK. Documents the rationale, schema, bus
topics, API surface, and the 5-commit implementation sequence.
Companion to development/CAMPAIGN_CLUSTERING.md. Substrate for the
clusterer worker designed there; ships empty so the campaign
clustering fixtures can encode honest multi-row-per-actor scenarios.
Two campaigns sharing a credential wordlist; everything else (ASN, IPs,
JA3, HASSH, active hours) divergent. Pass condition: clusterer must NOT
merge. Protects against the "credential overlap is identity" failure
mode that commodity wordlists invite.
* tests/clustering/fixture_harness.py — shared assert_fixture_bounds
helper + identity_clusterer (placeholder, trivially correct on
all-singleton fixtures) + credential_jaccard_clusterer (deliberately-
bad reference used to PROVE the fixture catches what it should).
* tests/clustering/test_shared_wordlist_fixture.py — bounds pass with
identity, bounds FAIL (homogeneity → 0) with the bad credential
clusterer. The latter is the proof the fixture earns its keep.
* tests/fixtures/campaigns/shared_wordlist.{yaml,expected.yaml}.
* tests/clustering/test_lone_wolf_fixture.py — refactored onto the
shared harness. No behavior change.
Pre-implementation scaffolding for campaign clustering. The simulator is
the spec — algorithm code follows once fixtures + metrics are stable.
* decnet/clustering/ukc.py — UKCPhase enum (19 phases across In/Through/Out
stages), OBSERVABLE_PHASES set, stage_of() helper. Vocabulary aligns
with future MITRE ATT&CK tagging so synthetic data and runtime phase
inference don't need renaming when TTP-tagging lands.
* tests/factories/campaign_factory.py — YAML DSL parser + deterministic
generator emitting truth-labeled SyntheticAttacker / SyntheticSession
records. Validates phase names, warns on unobservable phases, supports
multi-campaign + noise corpora.
* tests/clustering/metrics.py — pure-Python ARI / homogeneity /
completeness / singleton_recall (no sklearn dep). Decided before any
algorithm exists, on purpose.
* tests/fixtures/campaigns/lone_wolf.{yaml,expected.yaml} — fixture 3
from the design doc; simplest of the six, exercises the full pipeline
with an identity-clusterer placeholder.
* development/CAMPAIGN_CLUSTERING.md — design spec for the feature.
* development/DEVELOPMENT_V2.md — note on DSL evolution path
(concurrent phases, multi-actor per phase) deferred post-v1.
The threat-intel surface was IP-keyed on day one as an expedient — the
worker is woken by IP-bearing bus events. ANTI's call: don't carry that
debt. NO IPs as primary keys anywhere on the attacker-intel surface.
Schema:
- attacker_uuid is now the canonical key — UNIQUE + FK to attackers.uuid.
- attacker_ip stays as a denormalised, indexed, NON-UNIQUE value column.
Updated on every upsert; useful for SIEM payloads and audit lookups,
but explicitly NOT a key. Model docstring says so.
- Pre-v1, no Alembic migration needed. SQLModel.metadata.create_all()
builds the new shape on fresh DBs.
Repo:
- upsert_attacker_intel now keys on attacker_uuid.
- get_attacker_intel_by_ip → get_attacker_intel_by_uuid.
- get_unenriched_attacker_ips → get_unenriched_attackers, returning
[{uuid, ip}] tuples so the worker writes by UUID and dispatches
provider calls by IP without a second round-trip.
Worker:
- _enrich_one(uuid, ip, ...) — UUID lands on the row, IP rides for
provider egress.
- attacker.intel.enriched bus payload gains attacker_uuid alongside
attacker_ip — webhook → SIEM consumers benefit; no removal.
API:
- GET /api/v1/attackers/{ip}/intel deleted outright (rip-and-replace,
never deployed beyond dev).
- GET /api/v1/attackers/{uuid}/intel is the only public route, matching
every other /attackers/* route.
Frontend:
- <IntelPanel uuid={id!} /> uses the URL param directly, fetches in
parallel with the rest of AttackerDetail rather than waiting on
attacker.ip.
Tests: re-keyed in place, 39 passed (same coverage as before the
refactor). Provider-impl tests untouched.
DEBT-041: closed in DEBT.md (entry preserved as historical rationale,
summary table flipped to ✅, remaining-open list shortened by one).
Read-only IP-keyed intel surface on the attacker detail page. Renders
the aggregate verdict (color-coded MALICIOUS/SUSPICIOUS/BENIGN/NO SIGNAL)
plus a per-provider row with verdict, queried-at timestamp, and
provider-specific detail (GreyNoise classification, AbuseIPDB
0-100 score, Feodo C2 listing + malware family, ThreatFox IOC match
+ malware family). 404 from the API renders as 'NO INTEL CACHED YET'
with a hint that decnet enrich will populate it on the next pass —
TTL drives the refresh, no manual button.
DEBT-041 documents the API/UI IP-keying as a v1 expedient that will
need a UUID-keyed sibling endpoint before federation lands. NAT
collisions, attacker.uuid consistency across attacker routes, and the
sequential-fetch UX are all callouts on that ticket; the migration
sketch is laid out so the v1.x followup is unambiguous.
Frontend build: clean (55.58 kB AttackerDetail bundle, +~5kB for the
panel). Note: not browser-tested in this session — recommend a manual
smoke against a deployed master before tagging.
Mirrors decnet-reuse-correlator.service.j2: same hardening posture
(NoNewPrivileges, ProtectSystem=full, etc.), same restart policy, same
log file convention. The decnet init renderer picks it up automatically
via the decnet-*.service.j2 glob.
Also reconciles a naming inconsistency I shipped earlier: the heartbeat
name was 'intel' (the package) but the CLI command and unit are 'enrich'
(the action). Renamed the heartbeat to 'enrich' so the workers panel
displays the same string the operator types and the same string in the
systemd unit file. Convention across the project: heartbeat name =
registry key = unit basename = CLI command name.
Registers 'enrich' in worker_registry.KNOWN_WORKERS and in the
start-all preferred order. The decnet.target Wants= list also picks
up the new unit so 'systemctl start decnet.target' brings everything
up together.
CLI command mirrors the reuse-correlate shape (--poll-interval, --ttl-hours,
--daemon). Run it under systemd as a sibling worker.
The API endpoint returns the most recent cached row for an attacker IP
or 404. Auth-gated via require_viewer like every other attacker route.
Also extends the worker test with a real FakeBus so the
attacker.intel.enriched publish path is exercised end-to-end (no longer
a no-op against NullBus).
Four concrete IntelProvider impls — three per-IP queries plus one bulk
feed:
* GreyNoiseProvider — community endpoint, optional API key for higher
rate limit. 404 = unknown (cache the absence so we don't re-query).
* AbuseIPDBProvider — score threshold mapping (>=75 malicious, >=25
suspicious, else benign). Self-disables with a clear error when no
API key is configured rather than burning quota.
* FeodoProvider — fetches the bulk botnet C2 IP feed once per refresh
window and answers every lookup from an in-memory set. Listed = C2.
* ThreatFoxProvider — POST /api/v1/ search_ioc query, optional Auth-Key
header. Match in data[] = malicious; no_result = absence-not-benign.
Every provider routes through decnet.net.http.stealth_client so the
egress UA never leaks 'DECNET'.
run_intel_loop fans out across configured providers per IP, writes the
aggregate row, and publishes attacker.intel.enriched. Mirrors the
correlation/reuse_worker.py wake-on pattern: subscribes to
attacker.observed and attacker.scored for sub-second latency, falls back
to a 60s poll when the bus is unavailable. Heartbeat + control-listener
wired so the workers panel sees it like every other supervised worker.
Aggregate verdict picks the strongest provider tier (malicious >
suspicious > benign > unknown). Provider-level errors land in
IntelResult.error and are logged without poisoning the row — partial
success is the expected case for free-tier providers under their daily
caps.
Concrete provider impls land in follow-up commits; the worker is fully
exercised here against fake providers so the framing is locked in.
Outbound calls to 3rd-party services (threat-intel providers, future TI
lookups) MUST NOT advertise 'DECNET' in their user-agent — operators
running honeypots want their reconnaissance dependencies to look like
generic infra. New decnet.net.http.stealth_client() returns a fresh
httpx.AsyncClient with a curl-shaped UA (pinned to a single constant so
future siblings — browser-shaped, Go-shaped — sit next to it cleanly).
Internal egress (webhook → operator's own SIEM, swarm worker → master)
keeps its DECNET-tagged UA; the docstring is explicit about not routing
those through this client.
IntelProvider is async-first (every concrete provider does HTTP), bounded
by a per-provider asyncio.Semaphore, and contractually never raises —
errors land in IntelResult.error so a single provider's outage doesn't
poison the worker pass for an entire IP.
Factory returns a list (not a singleton like geoip) because intel
enrichment fans out across all enabled providers per IP, with row-level
partial-success handling. Lazy imports keep the module dependency-free
when intel is disabled.
Concrete providers (greynoise/abuseipdb/feodo/threatfox) land in
follow-up commits — factory references them via lazy import so tests
covering the disabled and unknown-name paths pass on their own.
New TTL-cached threat-intel row keyed by attacker IP, with per-provider
verdict/raw/queried_at columns for GreyNoise, AbuseIPDB, abuse.ch Feodo
Tracker and ThreatFox. Carries schema_version from day one (federation
wire-format precedent set by SessionProfile). Repo gains
upsert_attacker_intel, get_attacker_intel_by_ip, and a
get_unenriched_attacker_ips backfill primitive that picks fresh + stale
rows for the forthcoming 'decnet enrich' worker.
Also documents the open-source intel-source backlog in DEVELOPMENT_V2.
The CredentialReuse table only stores the sha256+kind hash of the
secret; the printable + b64 forms live on the underlying Credential
rows. The dashboard drawer was therefore showing only the hash, which
defeats most of the value of having a reuse view in the first place.
Repo helpers list_credential_reuses + get_credential_reuse_by_id now
issue one batched SELECT against credentials keyed on the sha256s in
the result page and graft secret_printable + secret_b64 onto each row
before returning. The drawer renders the same printable/b64 code-block
the credentials inspector uses.
Adds the systemd template for the credential-reuse correlator daemon
and wires it into decnet.target so `decnet init` installs it
automatically (the unit installer globs decnet-*.service.j2). Mirrors
the mutator template: bus-woken Type=simple service with the standard
hardening + on-failure restart.
Also registers `reuse-correlator` in the in-process worker registry
(so the dashboard panel surfaces its heartbeat instead of dropping it
as unknown) and slots it into the start-all preferred order between
mutator and webhook.
Library shape (decnet/correlation/) consumed by profiler + reuse
correlator is the right model. The `decnet correlate` CLI helper has
been removed in the previous commit.
The CLI was a day-one debug helper that read a log file or stdin and
printed a traversal table. It hadn't been wired to the live data path
since the engine moved into the profiler worker (DEBT.md:218). No
deploy unit, no caller, no doc relied on it. Removed the command and
its two tests; `decnet/correlation/` stays as a library consumed by
the profiler and the reuse correlator.
Adds a CREDS/REUSE tab segment on the Credential Vault page. The REUSE
tab lists CredentialReuse rows (paginated 25 per page) ordered by
target_count desc; row-click opens a drawer mirroring the credentials
inspector with a deckies x services grid, attacker links, and a
PROFILING PENDING placeholder when attacker_uuids has not been
backfilled yet.
The CREDS tab gains a REUSE column showing a clickable target-count
badge for credentials whose (sha256, kind, principal) tuple matches a
reuse row; clicking the badge fetches and opens that row's drawer.
Section header gains a manual refresh button (no SSE/polling).
Ticks the credential-reuse line in DEVELOPMENT.md and notes the
vectorstore scaffold.
Read-only routes for the credential-reuse findings produced by the
correlator. Mirrors the /credentials route shape: JWT-gated via
require_viewer, paginated with optional secret_kind /
min_target_count filters, and a 404-on-missing detail route.
No POST/PUT/PATCH (and no body parsing) so no 400 contract is
documented.
Adds CorrelationEngine.correlate_credential_reuse + the
`decnet reuse-correlate` long-running worker. The worker mirrors the
mutator's bus-wake + slow-tick pattern: wakes on credential.captured
and attacker.observed for sub-second latency, falls back to a 60s
poll if the bus is unavailable, and publishes
credential.reuse.detected once per new or grown CredentialReuse row
(group-deduped so a 5-cred reuse doesn't emit 5 partial events).
The web ingester now publishes credential.captured after every
successful Credential upsert; bus + new repo helper
find_credential_reuse_candidates feed the engine pass.
Credential capture runs before the profiler mints an Attacker, so
Credential.attacker_uuid is nullable on write. The profiler now
backfills the FK after each successful upsert_attacker. Soft-fail
posture matches the surrounding behavior + smtp rollups so a backfill
error never blocks the next attacker.
Lays the storage and bus substrate for the "credential reuse patterns"
task in DEVELOPMENT.md and scaffolds decnet/vectorstore/ as the future
substrate for statistical attacker re-identification over behavioral
fingerprints. No correlator, profiler, API, or dashboard wiring in
this commit — see TODO.md for the handoff.
Schema:
- Credential.attacker_uuid (nullable FK to attackers.uuid),
backfilled by the profiler post-write to avoid coupling the
capture path to the profiler's ordering.
- CredentialReuse table — UUID PK, JSON list columns for the
accumulating attacker_uuids/ips/deckies/services, target_count
(the discriminative scalar), confidence reserved for a future
fuzzy-credential pass.
Repo:
- upsert_credential_reuse / list_credential_reuses /
get_credential_reuse_by_id / update_credential_attacker_uuid.
- Renamed pre-existing get_credential_reuse(secret_sha256) to
get_credential_attempts_for_secret(secret_sha256) — the new
findings table needs the cleaner name.
Bus topics:
- credential.captured (one per Credential upsert)
- credential.reuse.detected (correlator-emitted on insert/grow)
Vectorstore subpackage (decnet/vectorstore/, flat layout mirroring
decnet/bus/):
- BaseVectorStore ABC keyed by (kind, id) — kind discriminator
means new feature families are additive, no schema migration.
- FakeVectorStore (in-memory L2 KNN), NullVectorStore (no-op for
DECNET_VECTORSTORE_ENABLED=false), SqliteVecVectorStore (lazy
sqlite_vec extension load, one vec0 virtual table per kind).
- get_vectorstore() env-driven dispatch with graceful fallback
to FakeVectorStore when the sqlite-vec extension isn't on the
host, so workers don't crash on a missing optional dep.
Tests: 26 new (11 cred-reuse repo, 15 vectorstore). Existing
credentials and base-repo tests updated for the rename. Total: 34
passing on the touched files.
The events watcher's start-event filter previously called
_load_service_container_names(), which reads decnet-state.json on
every event. decnet deploy writes that state file out-of-band
with docker compose up, so a container's start event could
arrive before the state was committed — the watcher then dropped
the event silently and never tailed the container's stdout. The
visible symptom was an empty Credentials view (and Logs/Bounty)
after a fresh deploy until the collector was manually restarted.
Fix: stamp decnet.fleet.{service,decky,service_name} labels on
every fleet service container at compose-time, and let the
collector recognize either the fleet or topology label without
touching the state file. The state-file name match remains as a
fallback for legacy containers that predate the new labels.
New /credentials page mirroring the Bounty Vault pattern: list view
with search, dynamic service segment chips, plaintext vs hashed
secret rendering, and an inspector drawer with copyable SHA-256 +
service-fields JSON. Sidebar entry uses the Lock icon to keep
Bounty's Archive/Key visual identity distinct.
Surfaces the Credential table (deduped attacker auth attempts) via
a new /api/v1/credentials route. Mirrors the Bounty cache pattern
(5s TTL on the unfiltered default page) and reuses the existing
get_credentials / get_total_credentials repo methods + the already
defined CredentialsResponse DTO. Filters: search, service, attacker_ip.
When RDP_ENABLE_NLA=true (service_cfg.nla=true on the topology side),
confirm PROTOCOL_HYBRID on the X.224 Connection Confirm, upgrade the
socket to TLS using a self-signed cert generated at first start by
the entrypoint, then drive a tiny CredSSP loop:
- Read inbound TSRequest DER (bounded to MAX_TSREQUEST_LEN).
- Scan for the NTLMSSP signature, dispatch on message type:
Type 1 -> respond with a hand-built TSRequest carrying our Type 2
challenge. Type 3 -> parse_type3() and emit auth_attempt with the
universal credential SD shape (secret_kind = ntlmssp_v2).
- Hand-built DER: no pyasn1 dependency.
Also folds in a small fix-up to commit 1: SMB SERVER_CHALLENGE was
hardcoded to 0x11..0x88 across the fleet, which would let a scanner
fingerprint every DECNET decky by its NTLM challenge. Both SMB and
RDP now derive the 8-byte challenge from
instance_seed.random_bytes(8, "ntlm_challenge"), giving each decky a
deterministic-but-distinct value. SMB Dockerfile gets the
instance_seed.py copy too (was synced into the build context but not
COPYed into the image).
- decnet/services/rdp.py: optional service_cfg.nla bool flips
RDP_ENABLE_NLA in the compose env.
- decnet/templates/rdp/Dockerfile + entrypoint.sh: openssl install +
per-decky cert generation gated on RDP_ENABLE_NLA.
- 9 NLA unit tests cover the DER reader/builder, _handle_nla round-
trip with Type 1 / Type 3, oversized-DER rejection, and per-
NODE_NAME challenge divergence.
- DEBT.md: DEBT-040 closed; full TS_INFO_PACKET capture documented as
a follow-up if attacker telemetry justifies it.
Replace Twisted-based connection logger with an asyncio handler that
parses the X.224 Connection Request, extracts the mstshash routing
cookie (universal across mstsc / FreeRDP / Hydra / ncrack / MSF
rdp_login), records the rdpNegRequest.requestedProtocols flags, and
answers with a well-formed X.224 Connection Confirm selecting
PROTOCOL_RDP.
Scope-down vs. the original DEBT-040 plan: full TS_INFO_PACKET
extraction would require either Standard-RDP-Security RC4 stream-
cipher implementation (with our own RSA pair + MS-RDPBCGR signing) or
a complete MCS+GCC ASN.1/BER stack for the SSL path — both far
exceed the 150 LoC budget the DEBT cited. The mstshash cookie is the
only piece of credential information that flows in plaintext on the
wire when the attacker speaks RDP, so capturing it is the highest-
value-per-byte signal available without going down either rabbit
hole. Phase 3 (CredSSP/NLA, next commit) is where actual NTLMv2
hashes land.
- Drops Twisted dependency from rdp/Dockerfile; adds ntlmssp.py copy
ahead of the NLA path that consumes it.
- 7 unit tests cover cookie capture, requestedProtocols recording,
CC framing, no-cookie path, and oversized/non-TPKT drops.
Replace impacket's SimpleSMBServer with a hand-rolled asyncio SMB2
framer that walks Negotiate -> SessionSetup(Type1) -> SessionSetup(Type3)
just deep enough to extract the inner NTLMSSP Type 3 via the shared
parse_type3() parser. Always returns STATUS_LOGON_FAILURE; the
attacker's hash lands in the Credential table, the attacker doesn't
land on the host.
- decnet/engine/deployer.py: _sync_ntlmssp_sources() mirrors the
auth-helper / sessrec sync pattern, copies _shared/ntlmssp.py into
smb/ and rdp/ build contexts before docker compose up.
- Dockerfile: drop impacket dep, copy ntlmssp.py.
- 7 unit tests drive the asyncio handler in-process via
StreamReader.feed_data; assert dialect, MORE_PROCESSING_REQUIRED on
first SessionSetup, NTLMSSP Type 2 carriage in SPNEGO, credential
capture with universal SD shape, STATUS_LOGON_FAILURE on Type 3,
oversized-NBSS / SMB1 / short-PDU drops.
Ships the load-bearing primitive both Phase 5 (SMB) and Phase 7
(RDP NLA) need: a standalone NTLMSSP Type 3 (AUTHENTICATE_MESSAGE)
parser per MS-NLMP §2.2.1.3.
Surface:
parse_type3(blob) -> dict | None
find_ntlmssp(buf) -> int # locate NTLMSSP\\0 inside SPNEGO outer
Returns the universal Credential SD shape:
username + domain (decoded UTF-16-LE or ASCII per NEGOTIATE_UNICODE)
principal = "DOMAIN\\\\username"
secret_kind = "ntlmssp_v1" (24-byte fixed) or "ntlmssp_v2" (variable)
secret_b64 = base64 of NtChallengeResponse — canonical hashcat input
(-m 5500 v1, -m 5600 v2)
Bounds-checked for untrusted-input safety. Anonymous binds (empty NT
response) return None — no credential to record.
7 unit tests cover NTLMv1/v2 distinction, ASCII vs Unicode strings,
empty-domain shape, malformed signature/type rejection, and SPNEGO-
wrapped find_ntlmssp() lookup.
DEBT-040 opens to track the three remaining protocol framers that
will consume this parser:
- SMB: hand-rolled SMB2 + Session Setup framer (~200 LoC) replacing
Impacket's opaque SimpleSMBServer
- RDP basic auth: TPKT/X.224/MCS framer for legacy plaintext path
(~150 LoC)
- RDP NLA: TLS upgrade + CredSSP TSRequest parser, reuses parse_type3
via the SPNEGO inner blob (~250 LoC)
These are substantial protocol implementations each — landing them
inline with Phase 1-3+6's cred coverage rollout would have inflated
the session beyond reasonable scope. Cred-reuse analytics already work
across the 12 services covered in this session; the deferred three
just round out the fleet.
Plugs the cred-coverage gap for MongoDB. The template previously
parsed only the wire opcode + length and discarded the BSON body
entirely, so SCRAM-SHA-{1,256} client-proofs flowed straight through
without ever landing in the Credential table.
Adds an inline minimal BSON walker (~100 LoC) covering the 7 type
codes auth commands actually use: string, doc, array, binary, bool,
int32, int64. Hand-rolled rather than pulling pymongo as a runtime
dep — the parser is bounds-checked for untrusted-input safety
(won't loop on malformed length fields).
Wire flow MongoDB clients use for auth:
- OP_MSG body section (kind=0) → BSON doc with `saslStart` field
carrying mechanism + payload (SCRAM client-first-message:
"n,,n=<user>,r=<nonce>"). Username extracted, pinned to the
per-connection _sasl_username + _sasl_mechanism state.
- Subsequent OP_MSG with `saslContinue` → SCRAM client-final-message
("c=biws,r=<combined>,p=<base64 client-proof>"). The `p=` value is
the credential — emitted as secret_kind=scram_sha256 (or _sha1 /
_unknown depending on the prior saslStart's mechanism), principal
= the pinned username, secret_b64 = base64 of the decoded proof.
Reuse semantics: same client-proof across two auth attempts only
matches when both server salt and password were identical (proofs
include the salt). So cross-session reuse correlates only on
credential reuse against the same MongoDB account on the same decky
— honest, non-misleading signal.
680 tests pass across services, service_testing, db, web/ingester,
and core/fingerprinting (the broader scope my recent commits
touched). Phases 4, 5, 7 still pending (RDP basic-auth, SMB
NTLMSSP, RDP NLA).
Login forms (wp-login.php, phpMyAdmin, Joomla, etc.) ship a
`Content-Type: application/x-www-form-urlencoded` body with field
names like username/user/email/log/pwd/password. The HTTP/HTTPS
templates already captured the body as opaque bytes; now they parse
common login-form shapes into the universal credential SD shape.
Adds canonical templates/syslog_bridge.py:
extract_form_credentials(body, content_type) -> dict | None.
Field-name matching is case-insensitive and covers:
Principal: username, user, email, login, userid, account, log,
user_login (WordPress), uname / pma_username (phpMyAdmin)
Secret: password, pass, pwd, passwd, passwort, mot_de_passe,
user_password (WordPress), pma_password (phpMyAdmin)
The HTTP/HTTPS log_request handlers now call:
cred = classify_authorization(...) or extract_form_credentials(...)
— Authorization wins when present (current session credential beats
a follow-up form change), but POSTs to /wp-login.php with no Auth
header still surface their cleartext creds.
Secret-without-principal is intentional: a reset-confirm or auto-
fill abuse may carry a password without any field that maps to our
principal list. The cred row writes with principal=None — the
sha256 still correlates across services for reuse analytics.
The body capture cap bumped from 512 → 4096 chars so reasonable
form bodies aren't truncated before the cred extractor sees them;
the body stored in fields.body stays at 512 chars (display-friendly).
36 helper + emitter tests pass. Phases 4-7 still pending.
Closes the cred-coverage gap for two database services that had been
capturing only the username:
- MySQL — extends _handle_packet to read the auth-response after the
null-terminated username. mysql_native_password puts a 1-byte
length followed by 20 bytes: SHA1(password) XOR SHA1(salt +
SHA1(SHA1(password))). Plaintext irrecoverable, lands as
secret_kind="mysql_native_password" with the 20 hash bytes in
secret_b64. Hash is canonical for "hashcat -m 11200" if an operator
ever wants to crack offline.
- MSSQL — fixes a pre-existing bug AND adds password capture. The
prior _parse_login7_username read offsets 36/38, which is actually
ibHostName/cchHostName in the Login7 layout — username sat at
40/42 and was never touched. Replaced with _parse_login7_creds()
reading the correct offsets (40 username, 44 password). Login7
password is XOR-then-nibble-swap obfuscated against 0xa5;
_deobfuscate_login7_password reverses it. Plaintext-recoverable,
lands as secret_kind="plaintext".
The pre-existing test_login7_auth_logged_and_closes only verified the
error response ships and the connection closes; it didn't validate
the parsed username, so the hostname-as-username bug was silent. New
tests cover both the deobfuscation algorithm directly and the full
ingester round-trip for both services.
Sync: copies the canonical syslog_bridge.py into mysql/ and mssql/
template build contexts so service_testing tests load the version
with classify_authorization + encode_secret available.
37 tests pass in the touched scope. Phases 3-7 still pending.
Closes the cred-coverage gap for 7 services that already had the data
on the wire but never landed it in the Credential table:
- SNMP — community string lands as secret_kind="snmp_community",
principal=None (v1/v2c has no per-user identity, the community IS
the auth).
- SIP — Digest response hash, previously buried in the auth= header
dump, now classify_authorization()-extracted.
- HTTP / HTTPS — Authorization header was in the headers JSON but
never extracted. Now Basic decodes to plaintext, Bearer →
http_bearer (principal=None), Digest → http_digest_md5.
- K8s — already extracted Authorization but didn't normalize. Service-
account JWTs flow through as Bearer.
- Docker API — headers absent entirely. Adds the headers JSON dump
and runs Authorization through the classifier.
- Elasticsearch — five distinct request handlers; each gains a
per-handler _cred_fields() helper.
Adds canonical templates/syslog_bridge.py:classify_authorization().
Recognised: Basic / Bearer / Token / Digest. Unknown schemes (NTLM,
AWS4-HMAC, Negotiate) return None; the header still rides in the
ambient SD-block but isn't normalized as a credential. The SD shape
on the wire collapses sip_digest_md5 into http_digest_md5 — same
algorithm, so cross-protocol reuse correlates correctly when (rare)
nonce collisions allow.
Drive-by repair of tests/core/test_fingerprinting.py:
- The pre-existing `test_http_useragent_extracted` asserted both that
add_bounty was called exactly once AND that the UA payload carried
`path` and `method` fields. Both wrong since this session opened:
the http_quirks fingerprint added later fires too, and the UA
payload never actually included path/method despite the assertion.
- Adds `path`/`method` to the UA fingerprint payload (real operator
value: "Nikto hit /admin" beats "Nikto seen on this decky").
- Replaces `assert_awaited_once` with a `_find_ua_bounty()` helper
that filters add_bounty calls by `fingerprint_type`. New fingerprint
families landing later won't retroactively break old tests.
- Updates the two credential-bearing tests to use the post-DEBT-039
native shape (`secret_b64` / `principal`) and `upsert_credential`,
not the deleted legacy `username+password` adapter.
Also rebuilds the per-service fake `syslog_bridge` modules in
tests/service_testing/{conftest,test_imap,test_pop3,test_snmp,test_mqtt,test_smtp}.py
to expose `encode_secret` + `classify_authorization`. Service templates
that import either now no longer fail at test collection.
173 tests pass in the touched scope. Phases 2-7 still pending.
Honest correction to the "every cred-emitting service" claim. Audit
of templates/* found three gaps:
1. MQTT — was working through the legacy adapter, silently dropped
when Phase 3 (e696c2b) deleted it. Now migrated to encode_secret()
alongside the others.
2. Postgres — `auth, pw_hash=…` event captures the MD5
challenge-response the attacker sent. Plaintext irrecoverable, so
it never fit the (principal, secret_b64=raw_bytes) shape. Lands
in Credential as secret_kind="postgres_md5_challenge".
3. VNC — `auth_response, response=…hex` event captures the 16-byte
DES-encrypted challenge. Same situation as Postgres: plaintext
irrecoverable. Lands as secret_kind="vnc_des_response".
Adds a `secret_kind` discriminator column to Credential (default
"plaintext", indexed). The dedup tuple gains secret_kind so two
credentials with the same sha256 but different kinds are
fundamentally different rows — different challenges produce
different bytes for the same plaintext password, so cross-kind
reuse matches are meaningless and would only confuse analytics.
The model now genuinely covers every cred-emitting service in the
fleet:
plaintext SSH, Telnet, FTP, POP3, IMAP, SMTP, Redis, LDAP,
MQTT
postgres_md5_* Postgres
vnc_des_response VNC
Username-only services (MySQL/MSSQL — TDS pre-encryption captures
the user but never sees the password byte) intentionally don't feed
Credential — they're recon signals, not cred attempts.
40 tests pass in the touched scope. New cases: secret_kind dedups
independently in the repo; Postgres MD5 + VNC DES emitters thread
through; MQTT round-trips through the native branch.
Phase 3/3 of DEBT-039. Now that all six cred-emitting services
(SSH, Telnet, FTP, POP3, IMAP, SMTP, Redis, LDAP) emit the universal
`secret_b64`-bearing SD shape, the ingester's legacy fork has no
live emitters to handle. Deletes:
- `_ingest_credential_legacy()` — synthesized native fields from
username+password
- The `elif _fields.get("username") and _fields.get("password")`
branch in `_extract_bounty`
- `_printable_filter()` — only the legacy adapter called it; the
native branch trusts the emitter (encode_secret() in Python or
sd_escape() in C) to have already sanitized
- The legacy-adapter test cases in tests/web/test_ingester.py;
their coverage moved to tests/services/test_cred_emitters.py
per-service in Phase 2
The cred path is now single-shape end-to-end. A pre-migration log
row carrying only username+password silently produces no Credential
write — by design, since no current emitter writes that shape and
keeping a code path alive for theoretical legacy data risks masking
emitter regressions. Pre-v1: any historical Bounty cred rows from
before commit 2f47f67 stay untouched.
DEBT-039 marked resolved with summary of the three commits and the
silent-loss bug fix for Redis + LDAP that fell out of execution.
Phase 2/3 of DEBT-039. Switches FTP, POP3, IMAP, SMTP, Redis, and
LDAP from the legacy `username=` + `password=` SD-block shape to the
universal credential shape (`principal=` + `secret_printable=` +
`secret_b64=`) the new Credential storage model expects.
Pattern is uniform across all six services:
_log("auth_attempt", username=u, principal=u, **encode_secret(pw))
Each service emits the canonical SD keys. The ingester's native-shape
branch (introduced in 2f47f67) now writes their cred attempts
directly without going through the legacy adapter. Once Phase 3
removes the adapter the contract becomes single-shape.
Per-service notes:
- POP3 / IMAP — `status="success"|"failed"` renamed to
`outcome="success"|"failure"` to match Credential.outcome's
vocabulary; the ingester reads outcome directly.
- SMTP — AUTH path migrated; in addition the existing mail_from
event now exposes a parsed `domain=` field alongside the original
`value=` so future "what domains do attackers spoof from" analytics
have an indexed field. Not stored in Credential — regular Log row.
- Redis — was silently dropped by the legacy adapter (no `username`
field). Native branch handles `principal=None` correctly. BONUS
FIX: the Redis 6+ ACL syntax `AUTH <user> <pw>` now captures the
ACL username as principal (was previously discarded).
- LDAP — was silently dropped by the legacy adapter (no `password`
recognition for the `bind` event). Now lands as
`principal=<dn>`. BONUS FIX.
Tests (tests/services/test_cred_emitters.py, 9 cases):
- per-service native-shape ingest path produces correct Credential
rows; outcome maps for POP3/IMAP; principal=None for legacy Redis
AUTH; principal=dn for LDAP.
- mail_from event does NOT trigger a credential write (it's a
Log-only observation, not auth).
- 0xff/NUL/ANSI bytes in passwords survive losslessly through
secret_b64 even when secret_printable is sanitized.
Phase 3 deletes the legacy adapter once all migrations land — the
adapter has no live emitters to handle anymore.
Phase 1/3 of DEBT-039. Adds the Python emitter-side counterpart to
auth-helper.c's sd_escape + base64 logic so service templates can
emit the universal credential SD shape with a single spread:
_log("auth_attempt", principal=user, **encode_secret(password))
secret_printable mirrors the C helper's [0x20, 0x7f) → '?' contract;
secret_b64 preserves the ORIGINAL utf-8 bytes losslessly so non-ASCII
or control-byte payloads survive as fingerprinting signal even when
the printable form sanitizes them.
The canonical syslog_bridge.py is what _sync_logging_helper()
propagates into per-template build contexts at deploy time, so any
service that imports its local syslog_bridge picks this up
automatically on next rebuild.
Phase 2 migrates the six cred-emitting service templates (FTP, POP3,
IMAP, SMTP, Redis, LDAP) onto this helper. Phase 3 deletes the
ingester's legacy adapter once nothing emits the old shape.
Replaces the opaque Bounty.bounty_type='credential' path with a
dedicated `credentials` table whose schema is forward-compatible
across every auth-bearing service in the fleet. Hoisted indexed
columns (secret_sha256, principal, service, attacker_ip) carry the
universal reuse-analytics signal; service-specific JSON keys ride
in `fields`. Cross-service reuse queries become an indexed lookup
on secret_sha256 instead of JSON_EXTRACT scans.
Schema decisions baked in (per ANTI):
- New `Credential` table, not extension to Bounty
- Hoisted `principal` column for cross-service principal-reuse
- Standardized JSON keys: every payload carries secret_b64 +
secret_printable + principal universally; service-specific extras
(user, domain, dn, mech, …) ride alongside
The auth-helper SD-block emits the new shape natively. The ingester
forks at _extract_bounty:
- Native shape (SSH/Telnet, future emitters): secret_b64 present →
direct upsert_credential
- Legacy shape (FTP/POP3/IMAP/SMTP today): username + password →
adapter synthesizes secret_{b64,sha256,printable} on the fly,
upserts into the same Credential table. Tracked as DEBT-039;
one-shot bridge until those service templates migrate.
Defense-in-depth across five layers (input validation):
- C helper: bytes outside [0x20, 0x7f) collapse to '?', RFC 5424
escape rules for \\, ", ]; b64 preserves exact bytes
- Ingester native branch: rejects malformed secret_b64 (regex), drops
the credential row but keeps the underlying Log
- Ingester legacy adapter: same printable-ASCII filter as the C
code; sha256 + b64 over the original utf-8 bytes (lossless, even
when secret_printable is sanitized)
- DB column caps with truncation warning; sha256 always over the
full pre-truncation bytes so reuse queries match across truncation
- JSON serialized with ensure_ascii=True so utf8mb4 columns stay
safe even with non-ASCII service-specific keys
Bounty.bounty_type='credential' is no longer written. Pre-v1: no
historical backfill; existing rows stay untouched but unused.
595 tests pass; new tests cover the model + repo (upsert dedup,
null-principal independence, cross-service reuse, filters), both
ingester branches, b64 validation, sanitization preserving the
fingerprinting signal in b64.
Promotes auth-helper.c to decnet/templates/_shared/auth-helper/ and
adds _sync_auth_helper_sources() — mirrors the existing sessrec sync
pattern that keeps shared sources in step with per-template build
contexts.
Telnet's image grows the same multi-stage musl build, COPY of the
static helper into /usr/sbin/auth-helper, and prepended pam_exec line
in /etc/pam.d/login. Pulls in the `login` package (real Debian
PAM-aware /bin/login, replacing busybox's PAM-less applet) and
libpam-modules transitively for pam_exec.so.
Verified inside the rebuilt telnet image:
- /bin/login is the real 53KB Debian binary (PAM-aware)
- /etc/pam.d/login top line is the auth-helper hook
- pam_exec.so present at /usr/lib/x86_64-linux-gnu/security/pam_exec.so
- helper smoke-run emits correct RFC 5424 line for `telnetpw` →
password_b64="dGVsbmV0cHc="
SSH Dockerfile updated to read auth-helper.c from auth-helper/
subdirectory so both templates use the synced layout. The canonical
source lives in _shared/; per-template copies are tracked in git AND
synced at deploy time so a drift on either side rebases on the next
deploy.
Closes the telnet half of DEBT-038's #5 follow-up.
Real OpenSSH doesn't log attempted passwords — only success/failure
with username — leaving SSH the sole auth-bearing service in the
fleet that contributes nothing to the cred corpus FTP/MySQL/RDP/
VNC/etc. populate. Closes that gap with a tiny pam_exec shim.
A static C helper (~80 LoC, musl, ~38KB stripped) is wired into
/etc/pam.d/sshd as `auth optional pam_exec.so expose_authtok stdout
/usr/sbin/auth-helper`. pam_exec writes the attempted password to
the helper's stdin NUL-terminated; the helper formats an RFC 5424
line in the exact shape templates/syslog_bridge.py produces
(facility local0, PEN 55555, MSGID auth_attempt — same MSGID FTP
uses) and writes it to /proc/1/fd/1 so the existing collector
stdout-reader pipeline picks it up.
Two password fields ride in the SD-block:
- password= RFC 5424 escaped, ASCII-printable only, ? for non-
printables. FTP-compatible — existing dashboard
rendering picks up SSH attempts unchanged.
- password_b64= base64 of the exact PAM_AUTHTOK bytes. Preserves
NUL/0xff/control-byte fingerprinting signal that the
plain field necessarily drops.
Fail-open by design: the PAM line is `optional` so a malfunctioning
helper never blocks sshd auth. Better to miss a cred than break the
honeypot.
Verified end-to-end inside the rebuilt image:
- 38KB static ELF, runs without a dynamic linker
- correct RFC 5424 line for `hunter2` → b64 `aHVudGVyMg==`
- NUL truncation matches pam_exec's contract
- 0xff bytes survive losslessly through password_b64
- empty password produces a well-formed line (e.g. pubkey auth path)
Attacker list cards gain an AS<number> chip with the AS description
on hover. Attacker detail page adds an AS row beside ORIGIN — same
shape as the existing country/source pair so operators can read
"this attacker is in DE on AS24940 Hetzner" at a glance instead of
having to grep the IP into a separate tool.
Both fields collapse to "unknown" when the IP isn't BGP-announced
(CGNAT, dark space, RFC1918), matching the existing pattern for
country resolution.
Adds asn (int), as_name (varchar 128), asn_source (varchar 16) to
the Attacker SQLModel — direct columns, no _migrate_* helper per
feedback_no_new_migrations_prev1.
Profiler worker now calls decnet.asn.enrich_ip alongside the existing
geoip enrich_ip; both feed the upsert payload. Failure is total — if
either lookup throws or the IP is private/unannounced, the field stays
None and the row still writes.
Both lookups are independent: a CGNAT address can have a country (RIR
allocation) but no ASN (no BGP origin), and vice-versa for unrouted
RIR-allocated space. Storing them separately preserves that signal.
Mirrors decnet/geoip/ end-to-end: paths/base/factory/lookup at the
package level, iptoasn/ subpackage holds the data-source-specific
fetch+parse+provider. AsnLookup is bisect-indexed over (start, end,
AsnInfo) ranges with a pickled cache invalidated on raw-file mtime
bump.
Why iptoasn (and not bgp.tools / Team Cymru): public-domain dump,
zero attribution, no UA mandate, daily refresh — keeps DECNET stealth
intact (the geoip/rir module's "never identify as DECNET" comment
applies the same way here). bgp.tools' ToS would have required an
identifying UA, conflicting with feedback_stealth.
Public surface: decnet.asn.enrich_ip(ip) -> (asn, name, source) or
all-None on miss/disabled. Same shape as decnet.geoip.enrich_ip so
the profiler can compose them in one call site.