Commit Graph

79 Commits

Author SHA1 Message Date
681931d9bb docs(roadmap): tick certificate details and three sibling roadmap items 2026-04-28 11:41:17 -04:00
49da15823f refactor(realism): single source of truth for persona→login
decnet/realism/naming._home and decnet/canary/cultivator._persona_login
both normalised "John Smith"→"johnsmith" with identical logic. Lift
to decnet.realism.personas.login_for(persona) and have both consumers
import it. Drift between the two would have left canary placement and
realism path naming using different login derivations.
2026-04-27 17:39:04 -04:00
10fa8a84d1 docs(roadmap): mark TTL + TCP/IP stack fingerprinting complete
TTL extraction was already wired in the active prober and passive sniffer
plus profiler rollup; the checkbox was just stale. TCP/IP stack now
includes ToS/DSCP/ECN, IP-ID sequence classification, and ISN sequence
classification as of the previous three commits.
2026-04-26 20:30:46 -04:00
8d1c449173 docs(debt): log DEBT-042 + DEBT-043 from orchestrator UI scope
DEBT-042 — orchestrator failure-count badge is computed from the
in-memory SSE window; remediation is a dedicated stats endpoint.

DEBT-043 — no frontend test framework configured; the planned
Orchestrator.tsx component test couldn't be written without first
adding vitest + RTL.
2026-04-26 20:01:58 -04:00
943bb3a39d docs(identity): resolve merge revocability + SSE open questions
Open Question 1 (merge revocability): adopted. The clusterer will
clear merged_into_uuid on contradicting evidence and publish a new
identity.unmerged topic alongside the existing three identity.* topics
so subscribers on identity.> get it from day one.

Open Question 2 (AttackerDetail UX on identity_id change): adopted
SSE over refresh-on-focus. New endpoint will mirror the existing
topology mutator SSE (bus subscription on identity.>, JWT via ?token=,
snapshot-on-connect + live forward).

Risk 2 (API URL stability for soft-merged loser UUIDs): struck —
already shipped in commit dc3d08d (read-only API follows
merged_into_uuid and surfaces the canonical winner).
2026-04-26 07:33:36 -04:00
7904ef1308 docs(identity): IDENTITY_RESOLUTION.md design spec
Pre-implementation design for the observation/identity/campaign
three-level hierarchy. Sibling-add approach (not rename) — keep the
attackers table name, add attacker_identities as a sibling, nullable
attackers.identity_id FK. Documents the rationale, schema, bus
topics, API surface, and the 5-commit implementation sequence.

Companion to development/CAMPAIGN_CLUSTERING.md. Substrate for the
clusterer worker designed there; ships empty so the campaign
clustering fixtures can encode honest multi-row-per-actor scenarios.
2026-04-26 06:56:40 -04:00
00254629f8 feat(clustering): UKC phase enum + synthetic campaign factory + metric harness
Pre-implementation scaffolding for campaign clustering. The simulator is
the spec — algorithm code follows once fixtures + metrics are stable.

* decnet/clustering/ukc.py — UKCPhase enum (19 phases across In/Through/Out
  stages), OBSERVABLE_PHASES set, stage_of() helper. Vocabulary aligns
  with future MITRE ATT&CK tagging so synthetic data and runtime phase
  inference don't need renaming when TTP-tagging lands.
* tests/factories/campaign_factory.py — YAML DSL parser + deterministic
  generator emitting truth-labeled SyntheticAttacker / SyntheticSession
  records. Validates phase names, warns on unobservable phases, supports
  multi-campaign + noise corpora.
* tests/clustering/metrics.py — pure-Python ARI / homogeneity /
  completeness / singleton_recall (no sklearn dep). Decided before any
  algorithm exists, on purpose.
* tests/fixtures/campaigns/lone_wolf.{yaml,expected.yaml} — fixture 3
  from the design doc; simplest of the six, exercises the full pipeline
  with an identity-clusterer placeholder.
* development/CAMPAIGN_CLUSTERING.md — design spec for the feature.
* development/DEVELOPMENT_V2.md — note on DSL evolution path
  (concurrent phases, multi-actor per phase) deferred post-v1.
2026-04-26 06:29:10 -04:00
3eb67c9400 refactor(intel): re-key attacker_intel on attacker_uuid (closes DEBT-041)
The threat-intel surface was IP-keyed on day one as an expedient — the
worker is woken by IP-bearing bus events. ANTI's call: don't carry that
debt. NO IPs as primary keys anywhere on the attacker-intel surface.

Schema:
- attacker_uuid is now the canonical key — UNIQUE + FK to attackers.uuid.
- attacker_ip stays as a denormalised, indexed, NON-UNIQUE value column.
  Updated on every upsert; useful for SIEM payloads and audit lookups,
  but explicitly NOT a key. Model docstring says so.
- Pre-v1, no Alembic migration needed. SQLModel.metadata.create_all()
  builds the new shape on fresh DBs.

Repo:
- upsert_attacker_intel now keys on attacker_uuid.
- get_attacker_intel_by_ip → get_attacker_intel_by_uuid.
- get_unenriched_attacker_ips → get_unenriched_attackers, returning
  [{uuid, ip}] tuples so the worker writes by UUID and dispatches
  provider calls by IP without a second round-trip.

Worker:
- _enrich_one(uuid, ip, ...) — UUID lands on the row, IP rides for
  provider egress.
- attacker.intel.enriched bus payload gains attacker_uuid alongside
  attacker_ip — webhook → SIEM consumers benefit; no removal.

API:
- GET /api/v1/attackers/{ip}/intel deleted outright (rip-and-replace,
  never deployed beyond dev).
- GET /api/v1/attackers/{uuid}/intel is the only public route, matching
  every other /attackers/* route.

Frontend:
- <IntelPanel uuid={id!} /> uses the URL param directly, fetches in
  parallel with the rest of AttackerDetail rather than waiting on
  attacker.ip.

Tests: re-keyed in place, 39 passed (same coverage as before the
refactor). Provider-impl tests untouched.

DEBT-041: closed in DEBT.md (entry preserved as historical rationale,
summary table flipped to , remaining-open list shortened by one).
2026-04-26 05:35:29 -04:00
a009549326 feat(web): IntelPanel on AttackerDetail + DEBT-041 entry
Read-only IP-keyed intel surface on the attacker detail page. Renders
the aggregate verdict (color-coded MALICIOUS/SUSPICIOUS/BENIGN/NO SIGNAL)
plus a per-provider row with verdict, queried-at timestamp, and
provider-specific detail (GreyNoise classification, AbuseIPDB
0-100 score, Feodo C2 listing + malware family, ThreatFox IOC match
+ malware family). 404 from the API renders as 'NO INTEL CACHED YET'
with a hint that decnet enrich will populate it on the next pass —
TTL drives the refresh, no manual button.

DEBT-041 documents the API/UI IP-keying as a v1 expedient that will
need a UUID-keyed sibling endpoint before federation lands. NAT
collisions, attacker.uuid consistency across attacker routes, and the
sequential-fetch UX are all callouts on that ticket; the migration
sketch is laid out so the v1.x followup is unambiguous.

Frontend build: clean (55.58 kB AttackerDetail bundle, +~5kB for the
panel). Note: not browser-tested in this session — recommend a manual
smoke against a deployed master before tagging.
2026-04-26 05:25:25 -04:00
4ec0dd75c8 docs(roadmap): mark threat-intel enrichment shipped
Out-of-band 'decnet enrich' worker landed across commits feat(intel):
attacker_intel table → factory → providers → worker → CLI → API. v1
ships GreyNoise Community + AbuseIPDB + abuse.ch (Feodo Tracker bulk
feed and ThreatFox per-IP). Shodan / Censys / OTX remain in the
DEVELOPMENT_V2 backlog.
2026-04-26 05:18:05 -04:00
0dd3811436 feat(intel): attacker_intel table + repo helpers
New TTL-cached threat-intel row keyed by attacker IP, with per-provider
verdict/raw/queried_at columns for GreyNoise, AbuseIPDB, abuse.ch Feodo
Tracker and ThreatFox. Carries schema_version from day one (federation
wire-format precedent set by SessionProfile). Repo gains
upsert_attacker_intel, get_attacker_intel_by_ip, and a
get_unenriched_attacker_ips backfill primitive that picks fresh + stale
rows for the forthcoming 'decnet enrich' worker.

Also documents the open-source intel-source backlog in DEVELOPMENT_V2.
2026-04-26 04:56:47 -04:00
5fb7ebe433 docs(debt): close standalone graph-correlator follow-up
Library shape (decnet/correlation/) consumed by profiler + reuse
correlator is the right model. The `decnet correlate` CLI helper has
been removed in the previous commit.
2026-04-26 04:26:49 -04:00
bf87f8794a feat(dashboard): credential reuse tab, drawer, and bidirectional badge
Adds a CREDS/REUSE tab segment on the Credential Vault page. The REUSE
tab lists CredentialReuse rows (paginated 25 per page) ordered by
target_count desc; row-click opens a drawer mirroring the credentials
inspector with a deckies x services grid, attacker links, and a
PROFILING PENDING placeholder when attacker_uuids has not been
backfilled yet.

The CREDS tab gains a REUSE column showing a clickable target-count
badge for credentials whose (sha256, kind, principal) tuple matches a
reuse row; clicking the badge fetches and opens that row's drawer.

Section header gains a manual refresh button (no SSE/polling).

Ticks the credential-reuse line in DEVELOPMENT.md and notes the
vectorstore scaffold.
2026-04-26 03:55:56 -04:00
b3d1301925 feat(creds): DEBT-040 Phase 3 — RDP NLA / CredSSP NTLMv2 capture
When RDP_ENABLE_NLA=true (service_cfg.nla=true on the topology side),
confirm PROTOCOL_HYBRID on the X.224 Connection Confirm, upgrade the
socket to TLS using a self-signed cert generated at first start by
the entrypoint, then drive a tiny CredSSP loop:

- Read inbound TSRequest DER (bounded to MAX_TSREQUEST_LEN).
- Scan for the NTLMSSP signature, dispatch on message type:
  Type 1 -> respond with a hand-built TSRequest carrying our Type 2
  challenge. Type 3 -> parse_type3() and emit auth_attempt with the
  universal credential SD shape (secret_kind = ntlmssp_v2).
- Hand-built DER: no pyasn1 dependency.

Also folds in a small fix-up to commit 1: SMB SERVER_CHALLENGE was
hardcoded to 0x11..0x88 across the fleet, which would let a scanner
fingerprint every DECNET decky by its NTLM challenge. Both SMB and
RDP now derive the 8-byte challenge from
instance_seed.random_bytes(8, "ntlm_challenge"), giving each decky a
deterministic-but-distinct value. SMB Dockerfile gets the
instance_seed.py copy too (was synced into the build context but not
COPYed into the image).

- decnet/services/rdp.py: optional service_cfg.nla bool flips
  RDP_ENABLE_NLA in the compose env.
- decnet/templates/rdp/Dockerfile + entrypoint.sh: openssl install +
  per-decky cert generation gated on RDP_ENABLE_NLA.
- 9 NLA unit tests cover the DER reader/builder, _handle_nla round-
  trip with Type 1 / Type 3, oversized-DER rejection, and per-
  NODE_NAME challenge divergence.
- DEBT.md: DEBT-040 closed; full TS_INFO_PACKET capture documented as
  a follow-up if attacker telemetry justifies it.
2026-04-25 07:42:52 -04:00
afe02af5c2 feat(creds): NTLMSSP Type 3 parser + DEBT-040 for SMB/RDP/NLA framers
Ships the load-bearing primitive both Phase 5 (SMB) and Phase 7
(RDP NLA) need: a standalone NTLMSSP Type 3 (AUTHENTICATE_MESSAGE)
parser per MS-NLMP §2.2.1.3.

Surface:
  parse_type3(blob) -> dict | None
  find_ntlmssp(buf) -> int   # locate NTLMSSP\\0 inside SPNEGO outer

Returns the universal Credential SD shape:
  username + domain (decoded UTF-16-LE or ASCII per NEGOTIATE_UNICODE)
  principal = "DOMAIN\\\\username"
  secret_kind = "ntlmssp_v1" (24-byte fixed) or "ntlmssp_v2" (variable)
  secret_b64 = base64 of NtChallengeResponse — canonical hashcat input
               (-m 5500 v1, -m 5600 v2)

Bounds-checked for untrusted-input safety. Anonymous binds (empty NT
response) return None — no credential to record.

7 unit tests cover NTLMv1/v2 distinction, ASCII vs Unicode strings,
empty-domain shape, malformed signature/type rejection, and SPNEGO-
wrapped find_ntlmssp() lookup.

DEBT-040 opens to track the three remaining protocol framers that
will consume this parser:
  - SMB: hand-rolled SMB2 + Session Setup framer (~200 LoC) replacing
    Impacket's opaque SimpleSMBServer
  - RDP basic auth: TPKT/X.224/MCS framer for legacy plaintext path
    (~150 LoC)
  - RDP NLA: TLS upgrade + CredSSP TSRequest parser, reuses parse_type3
    via the SPNEGO inner blob (~250 LoC)

These are substantial protocol implementations each — landing them
inline with Phase 1-3+6's cred coverage rollout would have inflated
the session beyond reasonable scope. Cred-reuse analytics already work
across the 12 services covered in this session; the deferred three
just round out the fleet.
2026-04-25 07:19:30 -04:00
6b16c844b6 fix(creds): MQTT regression + secret_kind for hash credentials
Honest correction to the "every cred-emitting service" claim. Audit
of templates/* found three gaps:

1. MQTT — was working through the legacy adapter, silently dropped
   when Phase 3 (e696c2b) deleted it. Now migrated to encode_secret()
   alongside the others.
2. Postgres — `auth, pw_hash=…` event captures the MD5
   challenge-response the attacker sent. Plaintext irrecoverable, so
   it never fit the (principal, secret_b64=raw_bytes) shape. Lands
   in Credential as secret_kind="postgres_md5_challenge".
3. VNC — `auth_response, response=…hex` event captures the 16-byte
   DES-encrypted challenge. Same situation as Postgres: plaintext
   irrecoverable. Lands as secret_kind="vnc_des_response".

Adds a `secret_kind` discriminator column to Credential (default
"plaintext", indexed). The dedup tuple gains secret_kind so two
credentials with the same sha256 but different kinds are
fundamentally different rows — different challenges produce
different bytes for the same plaintext password, so cross-kind
reuse matches are meaningless and would only confuse analytics.

The model now genuinely covers every cred-emitting service in the
fleet:

  plaintext        SSH, Telnet, FTP, POP3, IMAP, SMTP, Redis, LDAP,
                   MQTT
  postgres_md5_*   Postgres
  vnc_des_response VNC

Username-only services (MySQL/MSSQL — TDS pre-encryption captures
the user but never sees the password byte) intentionally don't feed
Credential — they're recon signals, not cred attempts.

40 tests pass in the touched scope. New cases: secret_kind dedups
independently in the repo; Postgres MD5 + VNC DES emitters thread
through; MQTT round-trips through the native branch.
2026-04-25 06:16:57 -04:00
e696c2beb3 refactor(ingester): drop legacy cred adapter — DEBT-039 closed
Phase 3/3 of DEBT-039. Now that all six cred-emitting services
(SSH, Telnet, FTP, POP3, IMAP, SMTP, Redis, LDAP) emit the universal
`secret_b64`-bearing SD shape, the ingester's legacy fork has no
live emitters to handle. Deletes:

- `_ingest_credential_legacy()` — synthesized native fields from
  username+password
- The `elif _fields.get("username") and _fields.get("password")`
  branch in `_extract_bounty`
- `_printable_filter()` — only the legacy adapter called it; the
  native branch trusts the emitter (encode_secret() in Python or
  sd_escape() in C) to have already sanitized
- The legacy-adapter test cases in tests/web/test_ingester.py;
  their coverage moved to tests/services/test_cred_emitters.py
  per-service in Phase 2

The cred path is now single-shape end-to-end. A pre-migration log
row carrying only username+password silently produces no Credential
write — by design, since no current emitter writes that shape and
keeping a code path alive for theoretical legacy data risks masking
emitter regressions. Pre-v1: any historical Bounty cred rows from
before commit 2f47f67 stay untouched.

DEBT-039 marked resolved with summary of the three commits and the
silent-loss bug fix for Redis + LDAP that fell out of execution.
2026-04-25 06:04:09 -04:00
2f47f67eef feat(creds): future-proof Credential storage model
Replaces the opaque Bounty.bounty_type='credential' path with a
dedicated `credentials` table whose schema is forward-compatible
across every auth-bearing service in the fleet. Hoisted indexed
columns (secret_sha256, principal, service, attacker_ip) carry the
universal reuse-analytics signal; service-specific JSON keys ride
in `fields`. Cross-service reuse queries become an indexed lookup
on secret_sha256 instead of JSON_EXTRACT scans.

Schema decisions baked in (per ANTI):
- New `Credential` table, not extension to Bounty
- Hoisted `principal` column for cross-service principal-reuse
- Standardized JSON keys: every payload carries secret_b64 +
  secret_printable + principal universally; service-specific extras
  (user, domain, dn, mech, …) ride alongside

The auth-helper SD-block emits the new shape natively. The ingester
forks at _extract_bounty:
- Native shape (SSH/Telnet, future emitters): secret_b64 present →
  direct upsert_credential
- Legacy shape (FTP/POP3/IMAP/SMTP today): username + password →
  adapter synthesizes secret_{b64,sha256,printable} on the fly,
  upserts into the same Credential table. Tracked as DEBT-039;
  one-shot bridge until those service templates migrate.

Defense-in-depth across five layers (input validation):
- C helper: bytes outside [0x20, 0x7f) collapse to '?', RFC 5424
  escape rules for \\, ", ]; b64 preserves exact bytes
- Ingester native branch: rejects malformed secret_b64 (regex), drops
  the credential row but keeps the underlying Log
- Ingester legacy adapter: same printable-ASCII filter as the C
  code; sha256 + b64 over the original utf-8 bytes (lossless, even
  when secret_printable is sanitized)
- DB column caps with truncation warning; sha256 always over the
  full pre-truncation bytes so reuse queries match across truncation
- JSON serialized with ensure_ascii=True so utf8mb4 columns stay
  safe even with non-ASCII service-specific keys

Bounty.bounty_type='credential' is no longer written. Pre-v1: no
historical backfill; existing rows stay untouched but unused.

595 tests pass; new tests cover the model + repo (upsert dedup,
null-principal independence, cross-service reuse, filters), both
ingester branches, b64 validation, sanitization preserving the
fingerprinting signal in b64.
2026-04-25 05:29:26 -04:00
50c12d9e16 docs(debt): DEBT-038 #5 closed by telnet extension f1026b4 2026-04-25 04:53:04 -04:00
f5a9e10bdc docs(debt): DEBT-038 SSH PAM cred-capture limitations 2026-04-25 04:44:44 -04:00
c69fdbb4ac docs(roadmap): mark ASN lookup, GeoIP mapping, PTR records shipped 2026-04-25 04:03:11 -04:00
77a19ffe9f docs(roadmap): mark MazeNET SWARM topology deployment shipped 2026-04-25 03:42:32 -04:00
2bcef50ac5 feat(webhooks): circuit breaker auto-disables misbehaving subscriptions
After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed
deliveries, the worker calls trip_webhook_circuit(uuid, ts) which
flips enabled=False and stamps auto_disabled_at. The worker sets its
reload flag so the next dispatch epoch stops consuming events for the
tripped sub entirely — one dead receiver can't poison the shared
egress pool anymore.

Operator clears the trip via PATCH — setting enabled=True when the
sub was previously disabled clears auto_disabled_at, zeros
consecutive_failures, and clears last_error. Admin-pause → re-enable
hits the same path harmlessly.

Three observable states now distinguishable in the UI:
- Active              enabled=True,  auto_disabled_at=NULL
- Admin-paused        enabled=False, auto_disabled_at=NULL
- Tripped             enabled=False, auto_disabled_at=<ts>

UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and
a "N TRIPPED" count in the page header. Hover tooltip tells the
operator how to reset ("Re-enable via Edit").

record_webhook_failure now returns the new consecutive_failures count
so the worker can compare against the threshold without a second
roundtrip. trip_webhook_circuit is idempotent — re-tripping just
re-stamps auto_disabled_at.

Closes THREAT_MODEL WH-02 and DEBT-037 §1.
2026-04-24 16:24:33 -04:00
c2ff8d1a4f docs(debt): DEBT-037 — webhook delivery guarantees beyond MVP
The webhook MVP shipped with deliberate deferrals; this entry names
them so future PRs know exactly what's left to close: circuit
breaker, dead-letter table, delivery audit log, batch/coalescing,
per-subscription rate limiting, payload templates per destination,
and secret encryption at rest.

Non-negotiable even at MVP scope (HMAC signing, bus-off degraded
mode, jittered retry backoff) is called out explicitly to prevent
future contributors from weakening it under the banner of
"simplification."
2026-04-24 16:03:33 -04:00
638236113d feat(webhooks): non-blocking http:// warning + WH-03 accepted risk
WebhookResponse now carries a `warnings: list[str]` field. When the
subscription's URL starts with http://, an `insecure_url` advisory is
surfaced on every GET/CREATE without blocking the request. HMAC still
detects tampering regardless of transport — only read-confidentiality
is lost over plaintext — and test/dev environments without TLS stay
usable.

Matches the operator-trust posture already established by DA-06
(admin-on-admin protection is out of scope). The alternative — hard
rejection at admin time — was considered and declined; warning-plus-
visibility is the right shape.

THREAT_MODEL WH-03 accepted risk registered; revisit triggers are
multi-admin delegation, a regulated customer, or an operator ticket
asking for a DECNET_WEBHOOK_REQUIRE_HTTPS enforcement knob.
2026-04-24 15:53:30 -04:00
f84bf82f6c docs(webhook): roadmap tick + threat-model component
- DEVELOPMENT.md: tick the "Real-time alerting" roadmap item with a
  note that Slack/Telegram-specific senders remain per-destination
  follow-ups (they accept generic webhook payloads already).
- THREAT_MODEL.md: new Component 2 — DECNET↔External webhook
  destination. DFD, full STRIDE table, WH-01 (secret at rest) and
  WH-02 (half-dead-receiver retry waste) registered as accepted
  risks pointing at DEBT-037 for post-MVP hardening. Checklist lists
  two open items: OpenAPI schema omits `secret`, and http:// URL
  rejection at admin time.
2026-04-24 15:48:14 -04:00
162f7c1194 feat(api/sse): per-user connection cap + viewer-safe invariant
New decnet/web/sse_limits.py provides sse_connection_slot, an async
context manager that counts live SSE connections per user UUID and
raises 429 when a per-user cap is exceeded (default 5, override via
DECNET_SSE_MAX_PER_USER). Wired into both SSE generators as their
first async with, so the cap check fires before any stream data is
yielded.

The cap must sit inside the generator — StreamingResponse returns
before the generator body runs, so a handler-level wrapper would
release the slot immediately. Put prefetch + slot + loop all under
the one async with.

Also documents F6/I (role leakage) as mitigated-by-construction via
handler docstrings: every event type on both streams wraps data
already reachable via viewer-gated REST, so no per-event filter is
needed until a new event family is introduced. The invariant is
written into the handler docstrings so a future PR can't silently
add admin-only events.

Resolves THREAT_MODEL F6/I and F6/D.
2026-04-24 15:01:20 -04:00
df84981954 feat(api): pin response_model on dict-returning mutation routes
Every mutation route that returned an untyped dict now declares
response_model at the decorator. MessageResponse covers the eight
{"message": ...} envelopes (change-password, mutate-decky, mutate-
interval, update-deployment-limit, update-global-mutation-interval,
delete-user, update-user-role, reset-user-password). Purpose-built
models cover the richer shapes (DeployResponse for /deckies/deploy,
PurgeResponse for /config/reinit, ReapReportResponse for /reap-orphans,
UserResponse for /config/users). 204-No-Content and Response/
ORJSONResponse routes stay as-is.

The wire shape for clients is unchanged — the envelopes already only
shipped a message field. What changes is that a handler which
accidentally returns a richer dict (e.g. a full user row including
password_hash) would be silently stripped to the declared fields at
serialization time.

Also flips F4/D "expensive LIKE" to accepted (new DA-09) — the /logs
and /attackers search routes LIKE-scan unbounded columns, but both are
admin-gated, limit-capped, and operator rate-limit scope per DA-04.
FTS5 stays a performance TODO, not a security blocker.
2026-04-24 14:27:58 -04:00
a935bf2663 feat(api): cap offset on list-topologies and transcript endpoints
The other five query endpoints (/logs, /attackers, /attacker-commands,
/bounties, /topologies/{id}) already declared le=2147483647 on offset;
these two were inconsistently uncapped. Bring them in line to close
the F4/D deep-pagination row.

Also resolves F4/T (ORM sort injection — already mitigated by the
regex pattern on /attackers sort_by, no other route accepts a column
name) and F4/D (limit cap — already universal) with code pointers.
2026-04-24 14:14:25 -04:00
e53b580767 test(api): RBAC contract test — viewer JWT on every classified route
New test walks app.routes, classifies each APIRoute as admin/viewer/open
by identity-matching require_admin / require_viewer closures inside the
route's dependency tree, then asserts:
  - admin routes return 403 to a viewer JWT
  - viewer routes return neither 401 nor 403 to a viewer JWT
SSE routes skipped (separate scope under F6). Role hints deliberately
NOT encoded in the OpenAPI spec — classification stays server-side so
/openapi.json can't be used to enumerate admin routes.

Resolves THREAT_MODEL F2/I + F5/E; paired with the existing
test_schemathesis.py::test_auth_enforcement (401-half coverage).
2026-04-24 14:00:12 -04:00
99ccd41bb5 feat(api/artifacts): explicit Content-Disposition + X-Content-Type-Options
Harden the attacker-controlled artifact download path (F7) with explicit
response headers instead of relying on Starlette's defaults (which only
emit attachment for non-ASCII filenames and never set nosniff). Also
resolves the THREAT_MODEL F7 path-traversal row (containment check was
already in _resolve_artifact_path) and the fleet-deploy detail=str(e)
audit (all four sites are admin-gated deliberate validator UX or
structured worker-response fields).
2026-04-24 13:24:34 -04:00
3787f7e5ec docs(debt): DEBT-036 — session-profile ingester (keystroke dynamics)
The SessionProfile SQLModel table has shipped with every column
nullable since session-recording v1 landed — because the ingester
that populates them from the [t,"i",d] events in the transcript
shards does not exist yet (known as gap #2 in SIGNAL_CAPTURE_AUDIT).

A manual keystroke-dynamics pass over one real session (wget scanme.
nmap.orgh) trivially recovered CoV ≈ 0.74 (human band), a 467 ms
semantic pause before the URL argument, tight intra-word bigrams
(ge 79 ms, t<space> 83 ms), and slow start-of-action latency (w→g
225 ms) — all signals the existing schema columns were designed to
hold. So the missing piece is purely the ingester.

Entry captures:
- the manual case as the motivating + sanity-check target
  (ingester should produce CoV ≈ 0.74 ± 0.05 on the same shard),
- three schema extensions the manual analysis suggests beyond what
  the table carries today: kd_start_of_action_latency_ms,
  kd_pause_hist_{burst,think,distracted}, kd_top_bigrams,
- a non-PII discipline line: raw keystroke content (including
  captured passwords) MUST NOT land in SessionProfile columns —
  only timing and frequency aggregates.

Poll-driven ingestion can ship first; the bus-trigger path
piggybacks on DEBT-031's deferred session-boundary topics.
2026-04-24 10:41:55 -04:00
ec2360a5da docs(debt): DEBT-035 — artifacts written as the container uid, not the API's
Tracks the durable follow-up to 323077b. The transcripts soft-fail
shipped in that commit keeps the API from 500-ing on
/var/lib/decnet/artifacts/** permission mismatches, but the real
issue is that decoy containers write artifacts under a uid the API
can't read — today's workaround is a manual `sudo chown -R` after
every new deploy.

Three design options documented (container-runs-as-host-uid, setgid
+ shared group, inotify sidecar) with a recommendation, plus an
acceptance criterion: fresh init + deploy + record session → the
API can read the transcripts with no manual chown.
2026-04-24 01:21:09 -04:00
a6356abe27 docs(dev): post-v1 roadmap + check off shipped "Commands executed" item
- DEVELOPMENT_V2.md (new): post-v1 direction. Everything here is after
  the v1 box is closed — federation, advanced behavioral profiling,
  maze-scale topology work.
- DEVELOPMENT.md: flip "Commands executed" checkbox — full per-session
  command log already landed in the profiler's _extract_commands_from
  _events path.
2026-04-23 21:52:15 -04:00
ef4179ea1f feat(api): opaque 500 handler + error_id correlation for unhandled exceptions
Registers a generic @app.exception_handler(Exception) that catches anything
uncaught in route handlers / dependencies. Prod response is opaque:
{detail: 'Internal Server Error', error_id: <uuid4 hex>}. Dev mode
(DECNET_DEVELOPER=True) adds exception_type and traceback fields so
failures are debuggable without tailing server logs.

The error_id is logged alongside the full traceback server-side, letting
operators correlate a user's 500 report with the exact exception via
`grep <error_id> /var/log/decnet.log`.

FastAPI's own HTTPException routing and the existing
RequestValidationError / ValidationError / RateLimitExceeded handlers
still take precedence — this handler only fires on genuinely-uncaught
exceptions.

Flips threat model F1/I 'traceback / stack trace leakage' from ? to M
and logs a follow-up checklist entry for 4 detail=str(e) sites in the
fleet deploy router (admin-gated, different threat class, separate
audit).
2026-04-23 14:07:32 -04:00
2f4f81e5de feat(api): rate-limit /auth/login + scaffold threat model
Adds slowapi two-bucket rate limit on /auth/login — 10 attempts per
5 minutes per-IP AND per-username, tripping either → 429. Per-IP
catches botnets hitting one account; per-username catches distributed
credential stuffing against one account. In-memory storage: dashboard
API is single-process, Redis is disproportionate for v1.

X-Forwarded-For is deliberately NOT trusted (spoofable); reverse-proxy
deployments get one shared bucket per proxy IP. Logged in the threat
model as accepted risk DA-08, to be revisited when a verified-proxy
config lands.

Also scaffolds development/THREAT_MODEL.md with STRIDE-per-element
methodology, system-context DFD, and Dashboard↔API as the first fully
worked component (7 sub-flows, ~50 threat entries). F1 Authn ships
with 3 threats mitigated: rate limit (new), uniform 401 (verified
already in place), bcrypt length clamp (verified already in place via
Pydantic max_length=72).
2026-04-23 13:25:28 -04:00
6d769edce0 docs(debt): mark DEBT-034 (worker supervisor) shipped
Units + polkit rule + systemd_control helper + start endpoints +
installed flag + UI wiring all landed. SWARM-host start/stop and
crash-quarantine policy stay as named deferrals.
2026-04-22 14:14:22 -04:00
6725197d58 test(web): transcripts API + attacker-transcripts router coverage
Paging, truncation surfacing, admin gate, path traversal, sid-regex and
decky-mismatch rejection for /transcripts; mirror coverage for
/attackers/{uuid}/transcripts. Flips the Session Recording box in the
roadmap (sessrec pty relay now shipping end-to-end).
2026-04-21 23:11:40 -04:00
4596c1d69a feat(templates): add sessrec pty transcript recorder
New decnet/templates/_shared/sessrec/ — a small C program installed as the
login shell in SSH / Telnet deckies. Forkpty-relays /bin/bash, records each
chunk as an asciinema v2 event into a shared JSONL day-shard keyed by sid,
and emits one RFC 5424 session_recorded line on exit (direct to PID 1's
stdout, same pattern syslog_bridge.py uses).

Storage: one shard per (decky, UTC day) at
/var/lib/systemd/coredump/transcripts/sessions-YYYY-MM-DD.jsonl. Concurrent
appends are lock-free: each write is chunked below PIPE_BUF so O_APPEND
interleaves atomically. Per-session cap 10 MB with a trunc sentinel; disk-
free precheck (<200 MB) falls through to plain bash with a session_skipped
log event. Attacker src_ip resolves from \$SSH_CONNECTION, getpeername(0),
or utmp in that order. SIGWINCH appends a 'r' resize event so ncurses
replays stay aligned.

Stealth for v1: /etc/passwd shell-swap to /usr/libexec/login-session
(plausible login-machinery path) + prctl comm disguise. Full LD_PRELOAD
argv-zap is deferred — sshd strips LD_PRELOAD from the session env, so
wiring the existing argv_zap.so into this path needs a separate wrapper.

DEBT-033 opened for size-based day-shard rotation; v1's disk-free precheck
covers the worst case but can be blinded by a one-shot disk fill.
2026-04-21 22:56:42 -04:00
cf5ba5cf2a docs(debt): open DEBT-032 — prober can't detect fingerprint rotation
The mutation-event stream landed this session closes the "deckies are
atomic nodes" gap for service-list changes, but substrate identity is
really ``(service, implementation_fingerprint)``.  A base-image
rebuild that rotates OpenSSH 8.4 → 9.2 without changing the service
list is invisible to the correlation graph today because the prober's
dedup set is in-memory and per-run — no cross-run diff, no
"fingerprint changed" event.

DEBT-032 documents the required piece: a per-(decky, service,
probe_type) persistence layer + diff-on-change emission, with the
correlator's existing mutation-marker interleaving pattern as the
model for fingerprint markers.  A mutation-vs-fingerprint divergence
detector then falls out of the data model for free — fingerprint drift
without a preceding mutation ⇒ substrate_divergence finding.
2026-04-21 19:38:41 -04:00
f76fc09caf docs(debt): mark DEBT-031 resolved; document deferrals
All nine service workers now participate in the host-local bus: sniffer,
prober, correlator (via profiler), profiler, collector, ingester, agent,
forwarder, updater.  Pre-bus behavior is preserved end-to-end for
DECNET_BUS_ENABLED=false and get_bus() failures.

Three items intentionally deferred: realism-probe decky.{id}.state
(needs a realism probe path that doesn't exist yet), correlator session
boundaries (needs session state), and bus-wake subscriptions (publishes
landed; wake side wired to no subscriber today).
2026-04-21 17:02:57 -04:00
e083bbe17c docs(debt): add DEBT-031 — workers publish/subscribe to bus if available
Per-worker integration of the service bus shipped in DEBT-029. Publishes
are fire-and-forget; subscribes wake polling loops. Bus stays optional —
if get_bus() fails or DECNET_BUS_ENABLED=false, workers log once and
continue in poll-only mode (mirrors decnet/mutator/engine.py:run_watch_loop).
2026-04-21 14:49:45 -04:00
d97a32e2d0 docs(dev): resolve DEBT-030 phase A + add mutator-family bus smoke
- scripts/bus/smoke-mutator.sh: boots decnet bus, subscribes to
  topology.>, publishes one event per mutation-lifecycle state plus
  a topology.status transition, asserts all four land on the
  subscriber. Cheap E2E for the topic hierarchy the mutator + SSE
  route rely on.
- development/DEBT.md: mark DEBT-030  resolved (Phase A) with a
  summary of what shipped; flag the optimistic staged-buffer editor
  as Phase B follow-up, not debt.
2026-04-21 14:39:25 -04:00
fbf289ff63 feat(bus): host-local UNIX-socket pub/sub worker (DEBT-029)
Land the `decnet bus` worker and `get_bus()` factory. Transport is a
host-local UNIX-domain socket (0660, group=decnet); authz is the file
mode. Wire framing is a tiny verb-line + 4-byte-BE length + orjson body.
NATS-style wildcard topics (`*`, `>`). At-most-once, fire-and-forget —
DB stays the source of truth. `FakeBus` / `NullBus` for tests and the
disabled path. Cross-host federation is deferred to a future
`--bridge-tcp` mode; DEBT-030 is master-only and unblocked.
2026-04-21 13:49:02 -04:00
4481a947d4 docs(dev): tick shipped items on the roadmap
Plugin SDK docs and the 250-user / 100-req-per-second API targets are
met; mark them done.
2026-04-21 10:24:50 -04:00
8dd4c78b33 refactor: strip DECNET tokens from container-visible surface
Rename the container-side logging module decnet_logging → syslog_bridge
(canonical at templates/syslog_bridge.py, synced into each template by
the deployer). Drop the stale per-template copies; setuptools find was
picking them up anyway. Swap useradd/USER/chown "decnet" for "logrelay"
so no obvious token appears in the rendered container image.

Apply the same cloaking pattern to the telnet template that SSH got:
syslog pipe moves to /run/systemd/journal/syslog-relay and the relay
is cat'd via exec -a "systemd-journal-fwd". rsyslog.d conf rename
99-decnet.conf → 50-journal-forward.conf. SSH capture script:
/var/decnet/captured → /var/lib/systemd/coredump (real systemd path),
logger tag decnet-capture → systemd-journal. Compose volume updated
to match the new in-container quarantine path.

SD element ID shifts decnet@55555 → relay@55555; synced across
collector, parser, sniffer, prober, formatter, tests, and docs so the
host-side pipeline still matches what containers emit.
2026-04-17 22:57:53 -04:00
edc5c59f93 docs(profiles): archive locust run artifacts under development/profiles
Commit-by-commit evidence of the perf work: each CSV is the raw
Locust output for the commit hash in its filename, plus the four
fb69a06 variants (single worker, tracing on/off, single-core pinned,
12 workers) referenced in the README baseline table.
2026-04-17 22:05:35 -04:00
257f780d0f docs(bugs): document SSE /api/v1/stream BrokenPipe storm (BUG-003) 2026-04-17 17:48:42 -04:00
3945e72e11 perf: run bcrypt on a thread so it doesn't block the event loop
verify_password / get_password_hash are CPU-bound and take ~250ms each
at rounds=12. Called directly from async endpoints, they stall every
other coroutine for that window — the single biggest single-worker
bottleneck on the login path.

Adds averify_password / ahash_password that wrap the sync versions in
asyncio.to_thread. Sync versions stay put because _ensure_admin_user and
tests still use them.

5 call sites updated: login, change-password, create-user, reset-password.
tests/test_auth_async.py asserts parallel averify runs concurrently (~1x
of a single verify, not 2x).
2026-04-17 14:52:22 -04:00
c1d8102253 modified: DEVELOPMENT roadmap. one step closer to v1 2026-04-16 11:39:07 -04:00