From 825d7d72c9dbba66a492a7df9d86e44e36ef7720 Mon Sep 17 00:00:00 2001 From: anti Date: Wed, 17 Jun 2026 16:32:54 -0400 Subject: [PATCH] docs(1.1): RAM footprint analysis + release plan Fleet resident set ~2.57GB across 18 workers; ~1.5GB is the 86MB import floor paid 18x. Pinned root cause: topology/__init__ eager re-export of generate drags the full SQLModel ORM (26 tables, ~38MB) into every worker. --- deploy/tmpfiles.d/decnet.conf.j2 | 4 ++ development/RELEASE-1.1.md | 65 ++++++++++++++++++ development/improvements.md | 112 +++++++++++++++++++++++++++++++ 3 files changed, 181 insertions(+) create mode 100644 deploy/tmpfiles.d/decnet.conf.j2 create mode 100644 development/RELEASE-1.1.md create mode 100644 development/improvements.md diff --git a/deploy/tmpfiles.d/decnet.conf.j2 b/deploy/tmpfiles.d/decnet.conf.j2 new file mode 100644 index 00000000..665276b7 --- /dev/null +++ b/deploy/tmpfiles.d/decnet.conf.j2 @@ -0,0 +1,4 @@ +# /run/decnet hosts bus.sock (UDS, 0660, group={{ group }}). +# tmpfiles.d recreates it on every boot before any decnet-*.service starts, +# so the bus worker never silently falls back to ~/.decnet/bus.sock. +d /run/decnet 0755 root {{ group }} - diff --git a/development/RELEASE-1.1.md b/development/RELEASE-1.1.md new file mode 100644 index 00000000..53a8539c --- /dev/null +++ b/development/RELEASE-1.1.md @@ -0,0 +1,65 @@ +# DECNET 1.1 — RAM / Process-Footprint Release + +Predecessor: `v1.0.0`. Theme: cut the fleet resident set from **2.57 GB → target ~1.3 GB** +with near-zero risk, then optionally further via worker consolidation. + +Analysis & measurements: see [improvements.md](improvements.md). + +## Why + +18 long-running workers, ~2.57 GB resident. ~1.5 GB of that is the **same 86 MB import +floor paid 18×**, not workload. The floor is `import decnet.cli` pulling the entire +SQLModel ORM + all 26 model tables into every worker, even ones that never touch the DB. + +## Root cause (pinned) + +``` +cli/__init__.py:22 from . import (... topology ...) + → cli/topology.py:12 from decnet.topology.config import TopologyConfig + → decnet/topology/__init__.py:10 from decnet.topology.generator import generate ← TRIGGER + → generator → allocator → repository → web.db.models.topology → all 26 tables (~38 MB) +``` + +The `topology/__init__.py` eager re-export of `generate` is the single thread that drags +the ORM into every worker. No production code imports `generate` from the package surface +(only tests, and they import the `compose` submodule) — safe to make lazy. + +## Commit plan (incremental, one concern each) + +- [x] **C1 — docs.** `improvements.md` (analysis) + this release plan. +- [ ] **C2 — lazy topology re-export.** PEP 562 `__getattr__` in `topology/__init__.py` + so `generate` loads on access, not on package import. Public API unchanged. + Test: `import decnet.cli` must NOT pull `decnet.web.db.models`. Re-measure floor. +- [ ] **C3 — sweep remaining eager model pulls.** After C2, re-trace `import decnet.cli`; + defer any other command module that drags the ORM in for registration only. + Test: assert idle-worker floor stays under target. +- [ ] **C4 — extract idle-herd coroutines.** Hoist the inline `_run()` closures + (`webhook`, `canary`, `listener`, `forwarder`, `mutate`, `enrich`) into reusable + `async def run(bus, cfg)` in their packages, so they're hostable by a supervisor. + CLI commands become thin `asyncio.run(run(...))` wrappers. No behaviour change. +- [ ] **C5 — `decnet supervise`.** TaskGroup supervisor hosting the idle herd in one + process; reuses existing bus + `system.{worker}.control` shutdown. One systemd unit + replacing ~8. `# ponytail: shared event loop — split a worker back out if it needs + its own restart policy / MemoryMax`. +- [ ] **C6 — merge scapy workers.** Optional. `collect`/`probe`/`sniffer` share the 76 MB + scapy import once instead of 3×. + +## Scope boundaries (no creep) + +- **In:** import-floor reduction (C2–C3), idle-herd consolidation (C4–C5), scapy merge (C6). +- **Out:** `bus` (broker — stays alone), `api`/`web` (already multiprocess), `profiler`/`ttp` + (heavy resident state + real CPU — stay separate). Not touching DB schema, bus wire format, + or worker logic — only *where* code is imported and *which process* hosts it. + +## Risk ladder + +- C2–C3: import-site only, reversible, covered by an import-floor test. **Low.** +- C4: pure extraction, behaviour-preserving, existing tests guard it. **Low.** +- C5: introduces **shared fate** — one crash/OOM takes the herd; loses per-worker systemd + restart + `MemoryMax`. **Medium.** Verify on the live fleet before adopting; keep the + individual units as the fallback. Do C2–C4 first; C5 only if RAM still bites. + +## Projected + +- C2–C3 (import floor): 2.57 GB → **~1.3 GB**. Nearly free. +- C4–C6 (consolidation): → **~0.9 GB**. Costs process isolation. diff --git a/development/improvements.md b/development/improvements.md new file mode 100644 index 00000000..2f3b17a0 --- /dev/null +++ b/development/improvements.md @@ -0,0 +1,112 @@ +# DECNET RAM / Process-Footprint Improvements + +Status: analysis complete, implementation not started. +Measured 2026-06-17 on the dev box, 18 live `decnet` workers, CPython 3.11 (`.311`). + +## Headline + +Fleet resident set ≈ **2.57 GB across 18 processes**. The bulk is not workload — +it is the same import floor paid 18 times over. + +## Part A — the universal import tax (measured) + +Every worker pays **~86 MB at startup before doing any work**: + +``` +interpreter 12 MB ++ import decnet.cli 74 MB ← SQLModel/SQLAlchemy/Pydantic (~32MB) + + EVERY decnet.web.db.models.* table + + decnet.config + decnet.models += floor 86 MB paid 18× ≈ 1.5 GB of the 2.57 GB total +``` + +Measured cold, fresh interpreter each time (RSS): + +| Layer | Resident | Who pays | +|---|---:|---| +| CPython interpreter | ~12 MB | everyone (shared COW) | +| `import decnet.cli` | +74 MB | **every worker** | +| └ SQLModel/SQLAlchemy/Pydantic | ~32 MB | the ORM chain | +| └ all `decnet.web.db.models.*` tables | ~20 MB | eagerly imported | +| `scapy.all` | +76 MB | only `collect`, `probe`, `sniffer` | + +Confirmed NOT in the universal path: +- **scapy** — `scapy loaded after import decnet.cli? False`. Only the sniff/probe workers pay it. +- **pandas / numpy / sklearn** — no module-scope imports anywhere; already lazy-imported + inside the functions that use them. Codebase got this right; leave it. + +### Root cause + +`decnet/cli/__init__.py:22-48` eagerly does `from . import (agent, api, ... ttp)` — +all 26 command modules imported at process start. Each pulls `decnet.config` + +`decnet.models` + the `decnet.web.db.models.*` chain at module top. So `decnet canary` +(which never touches TTP/swarm/webhook tables) still parses every table's SQLModel +metaclass. + +importtime top offenders (pure model-table import cost, self time): +``` +decnet.web.db.models.topology 21ms +decnet.web.db.models.attackers 15ms +decnet.models 13ms +decnet.web.db.models.logs 11ms +... canary, ttp, swarm, auth, webhooks, orchestrator ... +``` + +## Part B — architecture map (for consolidation) + +All 18 workers are **already asyncio coroutines subscribing to one shared UNIX-socket +bus** (`decnet/bus/`), with a `system.{worker}.control` shutdown topic already wired and a +`system.{worker}.health` heartbeat every 10s. They are already independent tasks — nothing +needs re-architecting, only re-hosting. + +| Tier | Workers | Verdict | +|---|---|---| +| Broker | `bus` | Stays alone — it's the hub. | +| Already multiprocess by design | `api`/uvicorn, `web` (ThreadingTCPServer) | Leave them. | +| scapy + blocking sniff threads | `collect`, `probe`, `sniffer` | Keep out of main loop (76 MB scapy + GIL-thrashing threads). **Merge these 3** → pay scapy once. | +| Heavy resident state / CPU | `profiler` (353 MB), `ttp` (308 MB) | Keep separate — big live heaps, real CPU work; co-locating serializes them under GIL. | +| **The idle herd** ⭐ | `webhook`, `canary`, `listener`, `forwarder`, `mutate`, `orchestrator`, `reconciler`, `enrich`, + lighter clusterers | **The prize.** ~10 mostly-idle event-driven tasks each paying the 86 MB floor to `await` a bus event. Collapse into ONE supervisor. | + +Loop-type evidence (from architecture map): + +| Worker | Loop entry | Loop kind | +|---|---|---| +| bus | `cli/bus.py:10` → `bus/worker.py:44` | asyncio serve_forever + heartbeat | +| profiler | `cli/profiler.py:10` → `:33` | asyncio, 30s wakeup, batch 500 logs | +| ttp | `cli/ttp.py:46` → `:80` | asyncio queue pump on `attacker.observation.*` | +| clusterer | `cli/workers.py:260` → `:304` | bus-woken on `attacker.observed` | +| campaign-clusterer | `cli/workers.py:308` → `:362` | bus-woken on `identity.>` | +| web | `cli/web.py:27` → `:148` | ThreadingTCPServer.serve_forever (blocking) | +| api | `cli/api.py:18` → `:37` | subprocess.Popen uvicorn | + +## Recommendation — ordered, stop when RAM is fine + +### Step 1 — Lazy command registration (do first; safe, high-leverage) +Smallest diff, zero new failure modes, helps with or without consolidation. Typer only +needs a module imported to *run* a command, not to *register* it. Defer the +`from . import (...)` so `decnet canary` loads canary's deps only, not all 26 tables. +Reversible. Expected: idle workers drop well below the 86 MB floor. + +### Step 2 — Consolidate the idle herd (only if RAM still bites after step 1) +`decnet supervise` runs the idle event-driven workers as tasks in ONE process — pay the +floor 1× instead of ~10×. Plumbing already exists; the supervisor is ~10 lines: + +```python +async with asyncio.TaskGroup() as tg: + for w in IDLE_WORKERS: + tg.create_task(w.run(bus)) # each already a bus-subscribed coroutine +``` + +**Cost to weigh:** consolidation trades RAM for **shared fate** — one crash takes down +~10 workers, one OOM kills the herd, and you lose per-worker systemd restart policy and +`MemoryMax=` caps. That's why step 1 comes first: free safety, and may make step 2 +unnecessary. + +### Step 3 — Merge the 3 scapy workers +Share the 76 MB scapy import once instead of 3×. + +### Projected trajectory +- 2.57 GB → **~1.3 GB** from lazy imports alone (nearly free) +- → **~0.9 GB** if also consolidating the herd + merging scapy (costs isolation) + +The first 1.3 GB is nearly free; the last 400 MB costs you process isolation.