DECNET

Author	SHA1	Message	Date
anti	943bb3a39d	docs(identity): resolve merge revocability + SSE open questions Open Question 1 (merge revocability): adopted. The clusterer will clear merged_into_uuid on contradicting evidence and publish a new identity.unmerged topic alongside the existing three identity.* topics so subscribers on identity.> get it from day one. Open Question 2 (AttackerDetail UX on identity_id change): adopted SSE over refresh-on-focus. New endpoint will mirror the existing topology mutator SSE (bus subscription on identity.>, JWT via ?token=, snapshot-on-connect + live forward). Risk 2 (API URL stability for soft-merged loser UUIDs): struck — already shipped in commit `dc3d08d` (read-only API follows merged_into_uuid and surfaces the canonical winner).	2026-04-26 07:33:36 -04:00
anti	7904ef1308	docs(identity): IDENTITY_RESOLUTION.md design spec Pre-implementation design for the observation/identity/campaign three-level hierarchy. Sibling-add approach (not rename) — keep the attackers table name, add attacker_identities as a sibling, nullable attackers.identity_id FK. Documents the rationale, schema, bus topics, API surface, and the 5-commit implementation sequence. Companion to development/CAMPAIGN_CLUSTERING.md. Substrate for the clusterer worker designed there; ships empty so the campaign clustering fixtures can encode honest multi-row-per-actor scenarios.	2026-04-26 06:56:40 -04:00
anti	00254629f8	feat(clustering): UKC phase enum + synthetic campaign factory + metric harness Pre-implementation scaffolding for campaign clustering. The simulator is the spec — algorithm code follows once fixtures + metrics are stable. * decnet/clustering/ukc.py — UKCPhase enum (19 phases across In/Through/Out stages), OBSERVABLE_PHASES set, stage_of() helper. Vocabulary aligns with future MITRE ATT&CK tagging so synthetic data and runtime phase inference don't need renaming when TTP-tagging lands. * tests/factories/campaign_factory.py — YAML DSL parser + deterministic generator emitting truth-labeled SyntheticAttacker / SyntheticSession records. Validates phase names, warns on unobservable phases, supports multi-campaign + noise corpora. * tests/clustering/metrics.py — pure-Python ARI / homogeneity / completeness / singleton_recall (no sklearn dep). Decided before any algorithm exists, on purpose. * tests/fixtures/campaigns/lone_wolf.{yaml,expected.yaml} — fixture 3 from the design doc; simplest of the six, exercises the full pipeline with an identity-clusterer placeholder. * development/CAMPAIGN_CLUSTERING.md — design spec for the feature. * development/DEVELOPMENT_V2.md — note on DSL evolution path (concurrent phases, multi-actor per phase) deferred post-v1.	2026-04-26 06:29:10 -04:00
anti	3eb67c9400	refactor(intel): re-key attacker_intel on attacker_uuid (closes DEBT-041) The threat-intel surface was IP-keyed on day one as an expedient — the worker is woken by IP-bearing bus events. ANTI's call: don't carry that debt. NO IPs as primary keys anywhere on the attacker-intel surface. Schema: - attacker_uuid is now the canonical key — UNIQUE + FK to attackers.uuid. - attacker_ip stays as a denormalised, indexed, NON-UNIQUE value column. Updated on every upsert; useful for SIEM payloads and audit lookups, but explicitly NOT a key. Model docstring says so. - Pre-v1, no Alembic migration needed. SQLModel.metadata.create_all() builds the new shape on fresh DBs. Repo: - upsert_attacker_intel now keys on attacker_uuid. - get_attacker_intel_by_ip → get_attacker_intel_by_uuid. - get_unenriched_attacker_ips → get_unenriched_attackers, returning [{uuid, ip}] tuples so the worker writes by UUID and dispatches provider calls by IP without a second round-trip. Worker: - _enrich_one(uuid, ip, ...) — UUID lands on the row, IP rides for provider egress. - attacker.intel.enriched bus payload gains attacker_uuid alongside attacker_ip — webhook → SIEM consumers benefit; no removal. API: - GET /api/v1/attackers/{ip}/intel deleted outright (rip-and-replace, never deployed beyond dev). - GET /api/v1/attackers/{uuid}/intel is the only public route, matching every other /attackers/* route. Frontend: - <IntelPanel uuid={id!} /> uses the URL param directly, fetches in parallel with the rest of AttackerDetail rather than waiting on attacker.ip. Tests: re-keyed in place, 39 passed (same coverage as before the refactor). Provider-impl tests untouched. DEBT-041: closed in DEBT.md (entry preserved as historical rationale, summary table flipped to ✅, remaining-open list shortened by one).	2026-04-26 05:35:29 -04:00
anti	a009549326	feat(web): IntelPanel on AttackerDetail + DEBT-041 entry Read-only IP-keyed intel surface on the attacker detail page. Renders the aggregate verdict (color-coded MALICIOUS/SUSPICIOUS/BENIGN/NO SIGNAL) plus a per-provider row with verdict, queried-at timestamp, and provider-specific detail (GreyNoise classification, AbuseIPDB 0-100 score, Feodo C2 listing + malware family, ThreatFox IOC match + malware family). 404 from the API renders as 'NO INTEL CACHED YET' with a hint that decnet enrich will populate it on the next pass — TTL drives the refresh, no manual button. DEBT-041 documents the API/UI IP-keying as a v1 expedient that will need a UUID-keyed sibling endpoint before federation lands. NAT collisions, attacker.uuid consistency across attacker routes, and the sequential-fetch UX are all callouts on that ticket; the migration sketch is laid out so the v1.x followup is unambiguous. Frontend build: clean (55.58 kB AttackerDetail bundle, +~5kB for the panel). Note: not browser-tested in this session — recommend a manual smoke against a deployed master before tagging.	2026-04-26 05:25:25 -04:00
anti	4ec0dd75c8	docs(roadmap): mark threat-intel enrichment shipped Out-of-band 'decnet enrich' worker landed across commits feat(intel): attacker_intel table → factory → providers → worker → CLI → API. v1 ships GreyNoise Community + AbuseIPDB + abuse.ch (Feodo Tracker bulk feed and ThreatFox per-IP). Shodan / Censys / OTX remain in the DEVELOPMENT_V2 backlog.	2026-04-26 05:18:05 -04:00
anti	0dd3811436	feat(intel): attacker_intel table + repo helpers New TTL-cached threat-intel row keyed by attacker IP, with per-provider verdict/raw/queried_at columns for GreyNoise, AbuseIPDB, abuse.ch Feodo Tracker and ThreatFox. Carries schema_version from day one (federation wire-format precedent set by SessionProfile). Repo gains upsert_attacker_intel, get_attacker_intel_by_ip, and a get_unenriched_attacker_ips backfill primitive that picks fresh + stale rows for the forthcoming 'decnet enrich' worker. Also documents the open-source intel-source backlog in DEVELOPMENT_V2.	2026-04-26 04:56:47 -04:00
anti	5fb7ebe433	docs(debt): close standalone graph-correlator follow-up Library shape (decnet/correlation/) consumed by profiler + reuse correlator is the right model. The `decnet correlate` CLI helper has been removed in the previous commit.	2026-04-26 04:26:49 -04:00
anti	bf87f8794a	feat(dashboard): credential reuse tab, drawer, and bidirectional badge Adds a CREDS/REUSE tab segment on the Credential Vault page. The REUSE tab lists CredentialReuse rows (paginated 25 per page) ordered by target_count desc; row-click opens a drawer mirroring the credentials inspector with a deckies x services grid, attacker links, and a PROFILING PENDING placeholder when attacker_uuids has not been backfilled yet. The CREDS tab gains a REUSE column showing a clickable target-count badge for credentials whose (sha256, kind, principal) tuple matches a reuse row; clicking the badge fetches and opens that row's drawer. Section header gains a manual refresh button (no SSE/polling). Ticks the credential-reuse line in DEVELOPMENT.md and notes the vectorstore scaffold.	2026-04-26 03:55:56 -04:00
anti	b3d1301925	feat(creds): DEBT-040 Phase 3 — RDP NLA / CredSSP NTLMv2 capture When RDP_ENABLE_NLA=true (service_cfg.nla=true on the topology side), confirm PROTOCOL_HYBRID on the X.224 Connection Confirm, upgrade the socket to TLS using a self-signed cert generated at first start by the entrypoint, then drive a tiny CredSSP loop: - Read inbound TSRequest DER (bounded to MAX_TSREQUEST_LEN). - Scan for the NTLMSSP signature, dispatch on message type: Type 1 -> respond with a hand-built TSRequest carrying our Type 2 challenge. Type 3 -> parse_type3() and emit auth_attempt with the universal credential SD shape (secret_kind = ntlmssp_v2). - Hand-built DER: no pyasn1 dependency. Also folds in a small fix-up to commit 1: SMB SERVER_CHALLENGE was hardcoded to 0x11..0x88 across the fleet, which would let a scanner fingerprint every DECNET decky by its NTLM challenge. Both SMB and RDP now derive the 8-byte challenge from instance_seed.random_bytes(8, "ntlm_challenge"), giving each decky a deterministic-but-distinct value. SMB Dockerfile gets the instance_seed.py copy too (was synced into the build context but not COPYed into the image). - decnet/services/rdp.py: optional service_cfg.nla bool flips RDP_ENABLE_NLA in the compose env. - decnet/templates/rdp/Dockerfile + entrypoint.sh: openssl install + per-decky cert generation gated on RDP_ENABLE_NLA. - 9 NLA unit tests cover the DER reader/builder, _handle_nla round- trip with Type 1 / Type 3, oversized-DER rejection, and per- NODE_NAME challenge divergence. - DEBT.md: DEBT-040 closed; full TS_INFO_PACKET capture documented as a follow-up if attacker telemetry justifies it.	2026-04-25 07:42:52 -04:00
anti	afe02af5c2	feat(creds): NTLMSSP Type 3 parser + DEBT-040 for SMB/RDP/NLA framers Ships the load-bearing primitive both Phase 5 (SMB) and Phase 7 (RDP NLA) need: a standalone NTLMSSP Type 3 (AUTHENTICATE_MESSAGE) parser per MS-NLMP §2.2.1.3. Surface: parse_type3(blob) -> dict \| None find_ntlmssp(buf) -> int # locate NTLMSSP\\0 inside SPNEGO outer Returns the universal Credential SD shape: username + domain (decoded UTF-16-LE or ASCII per NEGOTIATE_UNICODE) principal = "DOMAIN\\\\username" secret_kind = "ntlmssp_v1" (24-byte fixed) or "ntlmssp_v2" (variable) secret_b64 = base64 of NtChallengeResponse — canonical hashcat input (-m 5500 v1, -m 5600 v2) Bounds-checked for untrusted-input safety. Anonymous binds (empty NT response) return None — no credential to record. 7 unit tests cover NTLMv1/v2 distinction, ASCII vs Unicode strings, empty-domain shape, malformed signature/type rejection, and SPNEGO- wrapped find_ntlmssp() lookup. DEBT-040 opens to track the three remaining protocol framers that will consume this parser: - SMB: hand-rolled SMB2 + Session Setup framer (~200 LoC) replacing Impacket's opaque SimpleSMBServer - RDP basic auth: TPKT/X.224/MCS framer for legacy plaintext path (~150 LoC) - RDP NLA: TLS upgrade + CredSSP TSRequest parser, reuses parse_type3 via the SPNEGO inner blob (~250 LoC) These are substantial protocol implementations each — landing them inline with Phase 1-3+6's cred coverage rollout would have inflated the session beyond reasonable scope. Cred-reuse analytics already work across the 12 services covered in this session; the deferred three just round out the fleet.	2026-04-25 07:19:30 -04:00
anti	6b16c844b6	fix(creds): MQTT regression + secret_kind for hash credentials Honest correction to the "every cred-emitting service" claim. Audit of templates/* found three gaps: 1. MQTT — was working through the legacy adapter, silently dropped when Phase 3 (`e696c2b`) deleted it. Now migrated to encode_secret() alongside the others. 2. Postgres — `auth, pw_hash=…` event captures the MD5 challenge-response the attacker sent. Plaintext irrecoverable, so it never fit the (principal, secret_b64=raw_bytes) shape. Lands in Credential as secret_kind="postgres_md5_challenge". 3. VNC — `auth_response, response=…hex` event captures the 16-byte DES-encrypted challenge. Same situation as Postgres: plaintext irrecoverable. Lands as secret_kind="vnc_des_response". Adds a `secret_kind` discriminator column to Credential (default "plaintext", indexed). The dedup tuple gains secret_kind so two credentials with the same sha256 but different kinds are fundamentally different rows — different challenges produce different bytes for the same plaintext password, so cross-kind reuse matches are meaningless and would only confuse analytics. The model now genuinely covers every cred-emitting service in the fleet: plaintext SSH, Telnet, FTP, POP3, IMAP, SMTP, Redis, LDAP, MQTT postgres_md5_* Postgres vnc_des_response VNC Username-only services (MySQL/MSSQL — TDS pre-encryption captures the user but never sees the password byte) intentionally don't feed Credential — they're recon signals, not cred attempts. 40 tests pass in the touched scope. New cases: secret_kind dedups independently in the repo; Postgres MD5 + VNC DES emitters thread through; MQTT round-trips through the native branch.	2026-04-25 06:16:57 -04:00
anti	e696c2beb3	refactor(ingester): drop legacy cred adapter — DEBT-039 closed Phase 3/3 of DEBT-039. Now that all six cred-emitting services (SSH, Telnet, FTP, POP3, IMAP, SMTP, Redis, LDAP) emit the universal `secret_b64`-bearing SD shape, the ingester's legacy fork has no live emitters to handle. Deletes: - `_ingest_credential_legacy()` — synthesized native fields from username+password - The `elif _fields.get("username") and _fields.get("password")` branch in `_extract_bounty` - `_printable_filter()` — only the legacy adapter called it; the native branch trusts the emitter (encode_secret() in Python or sd_escape() in C) to have already sanitized - The legacy-adapter test cases in tests/web/test_ingester.py; their coverage moved to tests/services/test_cred_emitters.py per-service in Phase 2 The cred path is now single-shape end-to-end. A pre-migration log row carrying only username+password silently produces no Credential write — by design, since no current emitter writes that shape and keeping a code path alive for theoretical legacy data risks masking emitter regressions. Pre-v1: any historical Bounty cred rows from before commit `2f47f67` stay untouched. DEBT-039 marked resolved with summary of the three commits and the silent-loss bug fix for Redis + LDAP that fell out of execution.	2026-04-25 06:04:09 -04:00
anti	2f47f67eef	feat(creds): future-proof Credential storage model Replaces the opaque Bounty.bounty_type='credential' path with a dedicated `credentials` table whose schema is forward-compatible across every auth-bearing service in the fleet. Hoisted indexed columns (secret_sha256, principal, service, attacker_ip) carry the universal reuse-analytics signal; service-specific JSON keys ride in `fields`. Cross-service reuse queries become an indexed lookup on secret_sha256 instead of JSON_EXTRACT scans. Schema decisions baked in (per ANTI): - New `Credential` table, not extension to Bounty - Hoisted `principal` column for cross-service principal-reuse - Standardized JSON keys: every payload carries secret_b64 + secret_printable + principal universally; service-specific extras (user, domain, dn, mech, …) ride alongside The auth-helper SD-block emits the new shape natively. The ingester forks at _extract_bounty: - Native shape (SSH/Telnet, future emitters): secret_b64 present → direct upsert_credential - Legacy shape (FTP/POP3/IMAP/SMTP today): username + password → adapter synthesizes secret_{b64,sha256,printable} on the fly, upserts into the same Credential table. Tracked as DEBT-039; one-shot bridge until those service templates migrate. Defense-in-depth across five layers (input validation): - C helper: bytes outside [0x20, 0x7f) collapse to '?', RFC 5424 escape rules for \\, ", ]; b64 preserves exact bytes - Ingester native branch: rejects malformed secret_b64 (regex), drops the credential row but keeps the underlying Log - Ingester legacy adapter: same printable-ASCII filter as the C code; sha256 + b64 over the original utf-8 bytes (lossless, even when secret_printable is sanitized) - DB column caps with truncation warning; sha256 always over the full pre-truncation bytes so reuse queries match across truncation - JSON serialized with ensure_ascii=True so utf8mb4 columns stay safe even with non-ASCII service-specific keys Bounty.bounty_type='credential' is no longer written. Pre-v1: no historical backfill; existing rows stay untouched but unused. 595 tests pass; new tests cover the model + repo (upsert dedup, null-principal independence, cross-service reuse, filters), both ingester branches, b64 validation, sanitization preserving the fingerprinting signal in b64.	2026-04-25 05:29:26 -04:00
anti	50c12d9e16	docs(debt): DEBT-038 #5 closed by telnet extension `f1026b4`	2026-04-25 04:53:04 -04:00
anti	f5a9e10bdc	docs(debt): DEBT-038 SSH PAM cred-capture limitations	2026-04-25 04:44:44 -04:00
anti	c69fdbb4ac	docs(roadmap): mark ASN lookup, GeoIP mapping, PTR records shipped	2026-04-25 04:03:11 -04:00
anti	77a19ffe9f	docs(roadmap): mark MazeNET SWARM topology deployment shipped	2026-04-25 03:42:32 -04:00
anti	2bcef50ac5	feat(webhooks): circuit breaker auto-disables misbehaving subscriptions After DECNET_WEBHOOK_CIRCUIT_THRESHOLD (default 5) consecutive failed deliveries, the worker calls trip_webhook_circuit(uuid, ts) which flips enabled=False and stamps auto_disabled_at. The worker sets its reload flag so the next dispatch epoch stops consuming events for the tripped sub entirely — one dead receiver can't poison the shared egress pool anymore. Operator clears the trip via PATCH — setting enabled=True when the sub was previously disabled clears auto_disabled_at, zeros consecutive_failures, and clears last_error. Admin-pause → re-enable hits the same path harmlessly. Three observable states now distinguishable in the UI: - Active enabled=True, auto_disabled_at=NULL - Admin-paused enabled=False, auto_disabled_at=NULL - Tripped enabled=False, auto_disabled_at=<ts> UI surfaces a TRIPPED · <ts> chip on the row (red, alert-styled) and a "N TRIPPED" count in the page header. Hover tooltip tells the operator how to reset ("Re-enable via Edit"). record_webhook_failure now returns the new consecutive_failures count so the worker can compare against the threshold without a second roundtrip. trip_webhook_circuit is idempotent — re-tripping just re-stamps auto_disabled_at. Closes THREAT_MODEL WH-02 and DEBT-037 §1.	2026-04-24 16:24:33 -04:00
anti	c2ff8d1a4f	docs(debt): DEBT-037 — webhook delivery guarantees beyond MVP The webhook MVP shipped with deliberate deferrals; this entry names them so future PRs know exactly what's left to close: circuit breaker, dead-letter table, delivery audit log, batch/coalescing, per-subscription rate limiting, payload templates per destination, and secret encryption at rest. Non-negotiable even at MVP scope (HMAC signing, bus-off degraded mode, jittered retry backoff) is called out explicitly to prevent future contributors from weakening it under the banner of "simplification."	2026-04-24 16:03:33 -04:00
anti	638236113d	feat(webhooks): non-blocking http:// warning + WH-03 accepted risk WebhookResponse now carries a `warnings: list[str]` field. When the subscription's URL starts with http://, an `insecure_url` advisory is surfaced on every GET/CREATE without blocking the request. HMAC still detects tampering regardless of transport — only read-confidentiality is lost over plaintext — and test/dev environments without TLS stay usable. Matches the operator-trust posture already established by DA-06 (admin-on-admin protection is out of scope). The alternative — hard rejection at admin time — was considered and declined; warning-plus- visibility is the right shape. THREAT_MODEL WH-03 accepted risk registered; revisit triggers are multi-admin delegation, a regulated customer, or an operator ticket asking for a DECNET_WEBHOOK_REQUIRE_HTTPS enforcement knob.	2026-04-24 15:53:30 -04:00
anti	f84bf82f6c	docs(webhook): roadmap tick + threat-model component - DEVELOPMENT.md: tick the "Real-time alerting" roadmap item with a note that Slack/Telegram-specific senders remain per-destination follow-ups (they accept generic webhook payloads already). - THREAT_MODEL.md: new Component 2 — DECNET↔External webhook destination. DFD, full STRIDE table, WH-01 (secret at rest) and WH-02 (half-dead-receiver retry waste) registered as accepted risks pointing at DEBT-037 for post-MVP hardening. Checklist lists two open items: OpenAPI schema omits `secret`, and http:// URL rejection at admin time.	2026-04-24 15:48:14 -04:00
anti	162f7c1194	feat(api/sse): per-user connection cap + viewer-safe invariant New decnet/web/sse_limits.py provides sse_connection_slot, an async context manager that counts live SSE connections per user UUID and raises 429 when a per-user cap is exceeded (default 5, override via DECNET_SSE_MAX_PER_USER). Wired into both SSE generators as their first async with, so the cap check fires before any stream data is yielded. The cap must sit inside the generator — StreamingResponse returns before the generator body runs, so a handler-level wrapper would release the slot immediately. Put prefetch + slot + loop all under the one async with. Also documents F6/I (role leakage) as mitigated-by-construction via handler docstrings: every event type on both streams wraps data already reachable via viewer-gated REST, so no per-event filter is needed until a new event family is introduced. The invariant is written into the handler docstrings so a future PR can't silently add admin-only events. Resolves THREAT_MODEL F6/I and F6/D.	2026-04-24 15:01:20 -04:00
anti	df84981954	feat(api): pin response_model on dict-returning mutation routes Every mutation route that returned an untyped dict now declares response_model at the decorator. MessageResponse covers the eight {"message": ...} envelopes (change-password, mutate-decky, mutate- interval, update-deployment-limit, update-global-mutation-interval, delete-user, update-user-role, reset-user-password). Purpose-built models cover the richer shapes (DeployResponse for /deckies/deploy, PurgeResponse for /config/reinit, ReapReportResponse for /reap-orphans, UserResponse for /config/users). 204-No-Content and Response/ ORJSONResponse routes stay as-is. The wire shape for clients is unchanged — the envelopes already only shipped a message field. What changes is that a handler which accidentally returns a richer dict (e.g. a full user row including password_hash) would be silently stripped to the declared fields at serialization time. Also flips F4/D "expensive LIKE" to accepted (new DA-09) — the /logs and /attackers search routes LIKE-scan unbounded columns, but both are admin-gated, limit-capped, and operator rate-limit scope per DA-04. FTS5 stays a performance TODO, not a security blocker.	2026-04-24 14:27:58 -04:00
anti	a935bf2663	feat(api): cap offset on list-topologies and transcript endpoints The other five query endpoints (/logs, /attackers, /attacker-commands, /bounties, /topologies/{id}) already declared le=2147483647 on offset; these two were inconsistently uncapped. Bring them in line to close the F4/D deep-pagination row. Also resolves F4/T (ORM sort injection — already mitigated by the regex pattern on /attackers sort_by, no other route accepts a column name) and F4/D (limit cap — already universal) with code pointers.	2026-04-24 14:14:25 -04:00
anti	e53b580767	test(api): RBAC contract test — viewer JWT on every classified route New test walks app.routes, classifies each APIRoute as admin/viewer/open by identity-matching require_admin / require_viewer closures inside the route's dependency tree, then asserts: - admin routes return 403 to a viewer JWT - viewer routes return neither 401 nor 403 to a viewer JWT SSE routes skipped (separate scope under F6). Role hints deliberately NOT encoded in the OpenAPI spec — classification stays server-side so /openapi.json can't be used to enumerate admin routes. Resolves THREAT_MODEL F2/I + F5/E; paired with the existing test_schemathesis.py::test_auth_enforcement (401-half coverage).	2026-04-24 14:00:12 -04:00
anti	99ccd41bb5	feat(api/artifacts): explicit Content-Disposition + X-Content-Type-Options Harden the attacker-controlled artifact download path (F7) with explicit response headers instead of relying on Starlette's defaults (which only emit attachment for non-ASCII filenames and never set nosniff). Also resolves the THREAT_MODEL F7 path-traversal row (containment check was already in _resolve_artifact_path) and the fleet-deploy detail=str(e) audit (all four sites are admin-gated deliberate validator UX or structured worker-response fields).	2026-04-24 13:24:34 -04:00
anti	3787f7e5ec	docs(debt): DEBT-036 — session-profile ingester (keystroke dynamics) The SessionProfile SQLModel table has shipped with every column nullable since session-recording v1 landed — because the ingester that populates them from the [t,"i",d] events in the transcript shards does not exist yet (known as gap #2 in SIGNAL_CAPTURE_AUDIT). A manual keystroke-dynamics pass over one real session (wget scanme. nmap.orgh) trivially recovered CoV ≈ 0.74 (human band), a 467 ms semantic pause before the URL argument, tight intra-word bigrams (ge 79 ms, t<space> 83 ms), and slow start-of-action latency (w→g 225 ms) — all signals the existing schema columns were designed to hold. So the missing piece is purely the ingester. Entry captures: - the manual case as the motivating + sanity-check target (ingester should produce CoV ≈ 0.74 ± 0.05 on the same shard), - three schema extensions the manual analysis suggests beyond what the table carries today: kd_start_of_action_latency_ms, kd_pause_hist_{burst,think,distracted}, kd_top_bigrams, - a non-PII discipline line: raw keystroke content (including captured passwords) MUST NOT land in SessionProfile columns — only timing and frequency aggregates. Poll-driven ingestion can ship first; the bus-trigger path piggybacks on DEBT-031's deferred session-boundary topics.	2026-04-24 10:41:55 -04:00
anti	ec2360a5da	docs(debt): DEBT-035 — artifacts written as the container uid, not the API's Tracks the durable follow-up to `323077b`. The transcripts soft-fail shipped in that commit keeps the API from 500-ing on /var/lib/decnet/artifacts/** permission mismatches, but the real issue is that decoy containers write artifacts under a uid the API can't read — today's workaround is a manual `sudo chown -R` after every new deploy. Three design options documented (container-runs-as-host-uid, setgid + shared group, inotify sidecar) with a recommendation, plus an acceptance criterion: fresh init + deploy + record session → the API can read the transcripts with no manual chown.	2026-04-24 01:21:09 -04:00
anti	a6356abe27	docs(dev): post-v1 roadmap + check off shipped "Commands executed" item - DEVELOPMENT_V2.md (new): post-v1 direction. Everything here is after the v1 box is closed — federation, advanced behavioral profiling, maze-scale topology work. - DEVELOPMENT.md: flip "Commands executed" checkbox — full per-session command log already landed in the profiler's _extract_commands_from _events path.	2026-04-23 21:52:15 -04:00
anti	ef4179ea1f	feat(api): opaque 500 handler + error_id correlation for unhandled exceptions Registers a generic @app.exception_handler(Exception) that catches anything uncaught in route handlers / dependencies. Prod response is opaque: {detail: 'Internal Server Error', error_id: <uuid4 hex>}. Dev mode (DECNET_DEVELOPER=True) adds exception_type and traceback fields so failures are debuggable without tailing server logs. The error_id is logged alongside the full traceback server-side, letting operators correlate a user's 500 report with the exact exception via `grep <error_id> /var/log/decnet.log`. FastAPI's own HTTPException routing and the existing RequestValidationError / ValidationError / RateLimitExceeded handlers still take precedence — this handler only fires on genuinely-uncaught exceptions. Flips threat model F1/I 'traceback / stack trace leakage' from ? to M and logs a follow-up checklist entry for 4 detail=str(e) sites in the fleet deploy router (admin-gated, different threat class, separate audit).	2026-04-23 14:07:32 -04:00
anti	2f4f81e5de	feat(api): rate-limit /auth/login + scaffold threat model Adds slowapi two-bucket rate limit on /auth/login — 10 attempts per 5 minutes per-IP AND per-username, tripping either → 429. Per-IP catches botnets hitting one account; per-username catches distributed credential stuffing against one account. In-memory storage: dashboard API is single-process, Redis is disproportionate for v1. X-Forwarded-For is deliberately NOT trusted (spoofable); reverse-proxy deployments get one shared bucket per proxy IP. Logged in the threat model as accepted risk DA-08, to be revisited when a verified-proxy config lands. Also scaffolds development/THREAT_MODEL.md with STRIDE-per-element methodology, system-context DFD, and Dashboard↔API as the first fully worked component (7 sub-flows, ~50 threat entries). F1 Authn ships with 3 threats mitigated: rate limit (new), uniform 401 (verified already in place), bcrypt length clamp (verified already in place via Pydantic max_length=72).	2026-04-23 13:25:28 -04:00
anti	6d769edce0	docs(debt): mark DEBT-034 (worker supervisor) shipped Units + polkit rule + systemd_control helper + start endpoints + installed flag + UI wiring all landed. SWARM-host start/stop and crash-quarantine policy stay as named deferrals.	2026-04-22 14:14:22 -04:00
anti	6725197d58	test(web): transcripts API + attacker-transcripts router coverage Paging, truncation surfacing, admin gate, path traversal, sid-regex and decky-mismatch rejection for /transcripts; mirror coverage for /attackers/{uuid}/transcripts. Flips the Session Recording box in the roadmap (sessrec pty relay now shipping end-to-end).	2026-04-21 23:11:40 -04:00
anti	4596c1d69a	feat(templates): add sessrec pty transcript recorder New decnet/templates/_shared/sessrec/ — a small C program installed as the login shell in SSH / Telnet deckies. Forkpty-relays /bin/bash, records each chunk as an asciinema v2 event into a shared JSONL day-shard keyed by sid, and emits one RFC 5424 session_recorded line on exit (direct to PID 1's stdout, same pattern syslog_bridge.py uses). Storage: one shard per (decky, UTC day) at /var/lib/systemd/coredump/transcripts/sessions-YYYY-MM-DD.jsonl. Concurrent appends are lock-free: each write is chunked below PIPE_BUF so O_APPEND interleaves atomically. Per-session cap 10 MB with a trunc sentinel; disk- free precheck (<200 MB) falls through to plain bash with a session_skipped log event. Attacker src_ip resolves from \$SSH_CONNECTION, getpeername(0), or utmp in that order. SIGWINCH appends a 'r' resize event so ncurses replays stay aligned. Stealth for v1: /etc/passwd shell-swap to /usr/libexec/login-session (plausible login-machinery path) + prctl comm disguise. Full LD_PRELOAD argv-zap is deferred — sshd strips LD_PRELOAD from the session env, so wiring the existing argv_zap.so into this path needs a separate wrapper. DEBT-033 opened for size-based day-shard rotation; v1's disk-free precheck covers the worst case but can be blinded by a one-shot disk fill.	2026-04-21 22:56:42 -04:00
anti	cf5ba5cf2a	docs(debt): open DEBT-032 — prober can't detect fingerprint rotation The mutation-event stream landed this session closes the "deckies are atomic nodes" gap for service-list changes, but substrate identity is really ``(service, implementation_fingerprint)``. A base-image rebuild that rotates OpenSSH 8.4 → 9.2 without changing the service list is invisible to the correlation graph today because the prober's dedup set is in-memory and per-run — no cross-run diff, no "fingerprint changed" event. DEBT-032 documents the required piece: a per-(decky, service, probe_type) persistence layer + diff-on-change emission, with the correlator's existing mutation-marker interleaving pattern as the model for fingerprint markers. A mutation-vs-fingerprint divergence detector then falls out of the data model for free — fingerprint drift without a preceding mutation ⇒ substrate_divergence finding.	2026-04-21 19:38:41 -04:00
anti	f76fc09caf	docs(debt): mark DEBT-031 resolved; document deferrals All nine service workers now participate in the host-local bus: sniffer, prober, correlator (via profiler), profiler, collector, ingester, agent, forwarder, updater. Pre-bus behavior is preserved end-to-end for DECNET_BUS_ENABLED=false and get_bus() failures. Three items intentionally deferred: realism-probe decky.{id}.state (needs a realism probe path that doesn't exist yet), correlator session boundaries (needs session state), and bus-wake subscriptions (publishes landed; wake side wired to no subscriber today).	2026-04-21 17:02:57 -04:00
anti	e083bbe17c	docs(debt): add DEBT-031 — workers publish/subscribe to bus if available Per-worker integration of the service bus shipped in DEBT-029. Publishes are fire-and-forget; subscribes wake polling loops. Bus stays optional — if get_bus() fails or DECNET_BUS_ENABLED=false, workers log once and continue in poll-only mode (mirrors decnet/mutator/engine.py:run_watch_loop).	2026-04-21 14:49:45 -04:00
anti	d97a32e2d0	docs(dev): resolve DEBT-030 phase A + add mutator-family bus smoke - scripts/bus/smoke-mutator.sh: boots decnet bus, subscribes to topology.>, publishes one event per mutation-lifecycle state plus a topology.status transition, asserts all four land on the subscriber. Cheap E2E for the topic hierarchy the mutator + SSE route rely on. - development/DEBT.md: mark DEBT-030 ✅ resolved (Phase A) with a summary of what shipped; flag the optimistic staged-buffer editor as Phase B follow-up, not debt.	2026-04-21 14:39:25 -04:00
anti	fbf289ff63	feat(bus): host-local UNIX-socket pub/sub worker (DEBT-029) Land the `decnet bus` worker and `get_bus()` factory. Transport is a host-local UNIX-domain socket (0660, group=decnet); authz is the file mode. Wire framing is a tiny verb-line + 4-byte-BE length + orjson body. NATS-style wildcard topics (`*`, `>`). At-most-once, fire-and-forget — DB stays the source of truth. `FakeBus` / `NullBus` for tests and the disabled path. Cross-host federation is deferred to a future `--bridge-tcp` mode; DEBT-030 is master-only and unblocked.	2026-04-21 13:49:02 -04:00
anti	4481a947d4	docs(dev): tick shipped items on the roadmap Plugin SDK docs and the 250-user / 100-req-per-second API targets are met; mark them done.	2026-04-21 10:24:50 -04:00
anti	8dd4c78b33	refactor: strip DECNET tokens from container-visible surface Rename the container-side logging module decnet_logging → syslog_bridge (canonical at templates/syslog_bridge.py, synced into each template by the deployer). Drop the stale per-template copies; setuptools find was picking them up anyway. Swap useradd/USER/chown "decnet" for "logrelay" so no obvious token appears in the rendered container image. Apply the same cloaking pattern to the telnet template that SSH got: syslog pipe moves to /run/systemd/journal/syslog-relay and the relay is cat'd via exec -a "systemd-journal-fwd". rsyslog.d conf rename 99-decnet.conf → 50-journal-forward.conf. SSH capture script: /var/decnet/captured → /var/lib/systemd/coredump (real systemd path), logger tag decnet-capture → systemd-journal. Compose volume updated to match the new in-container quarantine path. SD element ID shifts decnet@55555 → relay@55555; synced across collector, parser, sniffer, prober, formatter, tests, and docs so the host-side pipeline still matches what containers emit.	2026-04-17 22:57:53 -04:00
anti	edc5c59f93	docs(profiles): archive locust run artifacts under development/profiles Commit-by-commit evidence of the perf work: each CSV is the raw Locust output for the commit hash in its filename, plus the four `fb69a06` variants (single worker, tracing on/off, single-core pinned, 12 workers) referenced in the README baseline table.	2026-04-17 22:05:35 -04:00
anti	257f780d0f	docs(bugs): document SSE /api/v1/stream BrokenPipe storm (BUG-003)	2026-04-17 17:48:42 -04:00
anti	3945e72e11	perf: run bcrypt on a thread so it doesn't block the event loop verify_password / get_password_hash are CPU-bound and take ~250ms each at rounds=12. Called directly from async endpoints, they stall every other coroutine for that window — the single biggest single-worker bottleneck on the login path. Adds averify_password / ahash_password that wrap the sync versions in asyncio.to_thread. Sync versions stay put because _ensure_admin_user and tests still use them. 5 call sites updated: login, change-password, create-user, reset-password. tests/test_auth_async.py asserts parallel averify runs concurrently (~1x of a single verify, not 2x).	2026-04-17 14:52:22 -04:00
anti	c1d8102253	modified: DEVELOPMENT roadmap. one step closer to v1	2026-04-16 11:39:07 -04:00
anti	49f3002c94	added: docs; modified: .gitignore Some checks failed CI / Lint (ruff) (push) Successful in 18s Details CI / SAST (bandit) (push) Successful in 19s Details CI / Dependency audit (pip-audit) (push) Successful in 40s Details CI / Test (Standard) (3.11) (push) Successful in 2m38s Details CI / Test (Standard) (3.12) (push) Successful in 2m56s Details CI / Test (Live) (3.11) (push) Failing after 1m3s Details CI / Test (Fuzz) (3.11) (push) Has been skipped Details CI / Merge dev → testing (push) Has been skipped Details CI / Prepare Merge to Main (push) Has been skipped Details CI / Finalize Merge to Main (push) Has been skipped Details	2026-04-16 02:10:38 -04:00
anti	70d8ffc607	feat: complete OTEL tracing across all services with pipeline bridge and docs Extends tracing to every remaining module: all 23 API route handlers, correlation engine, sniffer (fingerprint/p0f/syslog), prober (jarm/hassh/tcpfp), profiler behavioral analysis, logging subsystem, engine, and mutator. Bridges the ingester→SSE trace gap by persisting trace_id/span_id columns on the logs table and creating OTEL span links in the SSE endpoint. Adds log-trace correlation via _TraceContextFilter injecting otel_trace_id into Python LogRecords. Includes development/docs/TRACING.md with full span reference (76 spans), pipeline propagation architecture, quick start guide, and troubleshooting.	2026-04-16 00:58:08 -04:00
anti	65ddb0b359	feat: add OpenTelemetry distributed tracing across all DECNET services Gated by DECNET_DEVELOPER_TRACING env var (default off, zero overhead). When enabled, traces flow through FastAPI routes, background workers (collector, ingester, profiler, sniffer, prober), engine/mutator operations, and all DB calls via TracedRepository proxy. Includes Jaeger docker-compose for local dev and 18 unit tests.	2026-04-15 23:23:13 -04:00
anti	0ab97d0ade	docs: document decnet domain models and fleet transformation	2026-04-15 18:01:27 -04:00

1 2

75 Commits