Files

anti 825d7d72c9 docs(1.1): RAM footprint analysis + release plan

Fleet resident set ~2.57GB across 18 workers; ~1.5GB is the 86MB import
floor paid 18x. Pinned root cause: topology/__init__ eager re-export of
generate drags the full SQLModel ORM (26 tables, ~38MB) into every worker.

2026-06-17 16:32:54 -04:00

5.3 KiB

Raw Blame History

DECNET RAM / Process-Footprint Improvements

Status: analysis complete, implementation not started. Measured 2026-06-17 on the dev box, 18 live decnet workers, CPython 3.11 (.311).

Headline

Fleet resident set ≈ 2.57 GB across 18 processes. The bulk is not workload — it is the same import floor paid 18 times over.

Part A — the universal import tax (measured)

Every worker pays ~86 MB at startup before doing any work:

interpreter            12 MB
+ import decnet.cli    74 MB   ← SQLModel/SQLAlchemy/Pydantic (~32MB)
                                 + EVERY decnet.web.db.models.* table
                                 + decnet.config + decnet.models
= floor                86 MB   paid 18× ≈ 1.5 GB of the 2.57 GB total

Measured cold, fresh interpreter each time (RSS):

Layer	Resident	Who pays
CPython interpreter	~12 MB	everyone (shared COW)
`import decnet.cli`	+74 MB	every worker
└ SQLModel/SQLAlchemy/Pydantic	~32 MB	the ORM chain
└ all `decnet.web.db.models.*` tables	~20 MB	eagerly imported
`scapy.all`	+76 MB	only `collect`, `probe`, `sniffer`

Confirmed NOT in the universal path:

scapy — scapy loaded after import decnet.cli? False. Only the sniff/probe workers pay it.
pandas / numpy / sklearn — no module-scope imports anywhere; already lazy-imported inside the functions that use them. Codebase got this right; leave it.

Root cause

decnet/cli/__init__.py:22-48 eagerly does from . import (agent, api, ... ttp) — all 26 command modules imported at process start. Each pulls decnet.config + decnet.models + the decnet.web.db.models.* chain at module top. So decnet canary (which never touches TTP/swarm/webhook tables) still parses every table's SQLModel metaclass.

importtime top offenders (pure model-table import cost, self time):

decnet.web.db.models.topology    21ms
decnet.web.db.models.attackers   15ms
decnet.models                    13ms
decnet.web.db.models.logs        11ms
... canary, ttp, swarm, auth, webhooks, orchestrator ...

Part B — architecture map (for consolidation)

All 18 workers are already asyncio coroutines subscribing to one shared UNIX-socket bus (decnet/bus/), with a system.{worker}.control shutdown topic already wired and a system.{worker}.health heartbeat every 10s. They are already independent tasks — nothing needs re-architecting, only re-hosting.

Tier	Workers	Verdict
Broker	`bus`	Stays alone — it's the hub.
Already multiprocess by design	`api`/uvicorn, `web` (ThreadingTCPServer)	Leave them.
scapy + blocking sniff threads	`collect`, `probe`, `sniffer`	Keep out of main loop (76 MB scapy + GIL-thrashing threads). Merge these 3 → pay scapy once.
Heavy resident state / CPU	`profiler` (353 MB), `ttp` (308 MB)	Keep separate — big live heaps, real CPU work; co-locating serializes them under GIL.
The idle herd ⭐	`webhook`, `canary`, `listener`, `forwarder`, `mutate`, `orchestrator`, `reconciler`, `enrich`, + lighter clusterers	The prize. ~10 mostly-idle event-driven tasks each paying the 86 MB floor to `await` a bus event. Collapse into ONE supervisor.

Loop-type evidence (from architecture map):

Worker	Loop entry	Loop kind
bus	`cli/bus.py:10` → `bus/worker.py:44`	asyncio serve_forever + heartbeat
profiler	`cli/profiler.py:10` → `:33`	asyncio, 30s wakeup, batch 500 logs
ttp	`cli/ttp.py:46` → `:80`	asyncio queue pump on `attacker.observation.*`
clusterer	`cli/workers.py:260` → `:304`	bus-woken on `attacker.observed`
campaign-clusterer	`cli/workers.py:308` → `:362`	bus-woken on `identity.>`
web	`cli/web.py:27` → `:148`	ThreadingTCPServer.serve_forever (blocking)
api	`cli/api.py:18` → `:37`	subprocess.Popen uvicorn

Recommendation — ordered, stop when RAM is fine

Step 1 — Lazy command registration (do first; safe, high-leverage)

Smallest diff, zero new failure modes, helps with or without consolidation. Typer only needs a module imported to run a command, not to register it. Defer the from . import (...) so decnet canary loads canary's deps only, not all 26 tables. Reversible. Expected: idle workers drop well below the 86 MB floor.

Step 2 — Consolidate the idle herd (only if RAM still bites after step 1)

decnet supervise runs the idle event-driven workers as tasks in ONE process — pay the floor 1× instead of ~10×. Plumbing already exists; the supervisor is ~10 lines:

async with asyncio.TaskGroup() as tg:
    for w in IDLE_WORKERS:
        tg.create_task(w.run(bus))   # each already a bus-subscribed coroutine

Cost to weigh: consolidation trades RAM for shared fate — one crash takes down ~10 workers, one OOM kills the herd, and you lose per-worker systemd restart policy and MemoryMax= caps. That's why step 1 comes first: free safety, and may make step 2 unnecessary.

Step 3 — Merge the 3 scapy workers

Share the 76 MB scapy import once instead of 3×.

Projected trajectory

2.57 GB → ~1.3 GB from lazy imports alone (nearly free)
→ ~0.9 GB if also consolidating the herd + merging scapy (costs isolation)

The first 1.3 GB is nearly free; the last 400 MB costs you process isolation.

5.3 KiB Raw Blame History Unescape Escape