wiki: merge dev-guide-design into main

2026-04-18 06:09:32 -04:00
3 changed files with 381 additions and 0 deletions

84
Design-Overview.md Normal file

@@ -0,0 +1,84 @@
# Design Overview
A short tour of how DECNET is split into processes and why. For knob-level detail see [[Environment-Variables]]; for storage internals see [[Database-Drivers]].
## The microservice split
DECNET runs as a small constellation of workers around a FastAPI process. Each worker is a first-class CLI subcommand and can also be embedded in the API process for simple single-host deploys.
| Subsystem | Launch standalone | Embed in API | Primary job |
|-----------|-------------------|--------------|-------------|
| Web / API | `decnet web --daemon` | (this is the host) | FastAPI app, dashboard, REST endpoints |
| Collector | `decnet collect --daemon` | always runs | Ingest RFC 5424 syslog from deckies |
| Correlator | `decnet correlate --daemon` | always runs | Session + attacker correlation |
| Profiler | `decnet profiler --daemon` | `DECNET_EMBED_PROFILER=1` | Attacker profiling / scoring |
| Sniffer | `decnet sniffer --daemon` | `DECNET_EMBED_SNIFFER=1` | Passive PCAP on the decoy bridge |
| Prober | `decnet probe --daemon` | always runs | Active realism checks |
| Mutator | `decnet mutate --daemon --watch` | always runs | Runtime fleet mutation |
Every worker is also how `decnet deploy` spawns them — the deploy path shells out to `python -m decnet.cli <worker> --daemon` so there is exactly one code path, whether you run interactively or under systemd.
## Why split them at all
### Resilience
A crashed sniffer must not take the API down. A stuck profiler must not block an attacker write from the collector. Splitting into processes gives us the usual crash-domain isolation: supervise each unit under systemd (see [[Systemd-Setup]]), restart on its own schedule.
### Scaling
In UNIHOST mode everything lives on one machine. In SWARM / MULTIHOST mode the heavy workers (sniffer, profiler) can move to dedicated hosts while the API stays on the public-facing bridge. Because each worker reads the same repository via `get_repository()`, they are effectively stateless w.r.t. each other — they coordinate through the DB, not through shared memory.
### Write-load isolation
The API serves reads; the collector, correlator, and profiler are write-heavy. Under SQLite, single-writer contention was the #1 latency source when everything ran in-process. Breaking the writers out and letting them hold short transactions independently drops lock contention dramatically. If you outgrow even that, flip `DECNET_DB_TYPE=mysql`.
### Observability
Each subsystem emits its own RFC 5424 stream tagged with its own APP-NAME (`decnet.collector`, `decnet.sniffer`, `decnet.profiler`, …). That makes triage in the SIEM mechanical: filter by app, not by guesswork. Embedded mode muddies this because everything shares the API process.
## Embed mode
For dev and for the smallest possible single-host deploy, two workers can run inside the FastAPI process:
- `DECNET_EMBED_PROFILER=1` — profiler starts in a thread on app startup.
- `DECNET_EMBED_SNIFFER=1` — sniffer starts in a thread on app startup.
These are off by default. The rest of the constellation (collector, correlator, prober, mutator) always runs as standalone processes — `decnet deploy` supervises them through a small process registry in `decnet/cli.py::_service_registry`, which respawns any unit that dies. Embed mode exists only for the profiler and the sniffer, which are the two workers cheap enough to live in-process during dev.
### The duplication risk
Do **not** run embed mode *and* the standalone worker at the same time. That is how you get:
- **Duplicated events** — both sniffer copies persist the same packet.
- **Skipped events** — both profilers race on the same attacker row; one loses.
The env doc ([[Environment-Variables]]) flags this explicitly. The rule: pick one mode per host per worker. Systemd units shipped under `deploy/` assume standalone.
## Storage layer — the short version
DECNET uses a single repository pattern:
- `SQLModelRepository` is the base class. It holds all SQLModel / SQLAlchemy logic, queries, and transactions that are portable.
- `SQLiteRepository` and `MySQLRepository` subclass it and override only the dialect-specific bits (pragmas, pool config, upsert flavor).
- `get_repository()` in `decnet/web/db/factory.py` picks one based on `DECNET_DB_TYPE` (`sqlite` or `mysql`) and wraps it with telemetry.
- FastAPI routes take the repo via the `get_repo` dependency in `decnet/web/dependencies.py`.
Never import `SQLiteRepository` directly. See [[Database-Drivers]] for schema, migration, and tuning.
## Going deeper
The `development/` directory in the repo has low-level flow material that is too noisy to mirror here:
- `development/execution_graphs.md` — per-command call graphs.
- `development/complete_execution_graph.md` — one big graph across the whole system.
- `development/ast_graph.md` — static call/symbol graph.
If you are chasing a bug across subsystem boundaries, start from those.
## Related pages
- [[Developer-Guide]] — setup, layout, conventions.
- [[Writing-a-Service-Plugin]] — add a new honeypot service.
- [[Database-Drivers]] — SQLite vs MySQL.
- [[Environment-Variables]] — the full env surface.
- [[Systemd-Setup]] — running each worker as a supervised unit.

123
Developer-Guide.md Normal file

@@ -0,0 +1,123 @@
# Developer Guide
How to hack on DECNET. If you just want to deploy it, see [[Home]] and [[INI-Config-Format]] instead.
## Environment setup
DECNET pins its runtime deps in `requirements.lock`. Always work inside the project virtualenv — do not install into the system interpreter.
```bash
cd /path/to/DECNET
python -m venv .venv
source .venv/bin/activate
pip install -e .
```
Every subsequent shell must `source .venv/bin/activate` before running `pip`, `pytest`, or `decnet`. The CLI entrypoint is registered in `pyproject.toml` and resolves to `decnet.cli:app`.
To confirm the dev install:
```bash
decnet services # list registered service plugins
decnet distros # list base-image archetypes
pytest -q # run the suite
```
## Repository layout
High-level tour. Only the directories you will touch often are listed.
| Path | What lives there |
|------|------------------|
| `decnet/cli.py` | Typer app. Every `decnet <verb>` subcommand is defined here. |
| `decnet/services/` | Service plugins. One file per honeypot service. See [[Writing-a-Service-Plugin]]. |
| `decnet/services/base.py` | `BaseService` contract. |
| `decnet/services/registry.py` | Auto-discovery of `BaseService` subclasses. |
| `decnet/composer.py` | Turns a fleet spec into a `docker-compose` file. |
| `decnet/fleet.py` | Fleet planning: which decky runs which services on which IP. |
| `decnet/archetypes.py`, `decnet/distros.py` | OS personas + base-image selection. |
| `decnet/os_fingerprint.py` | TCP/IP stack tuning to bend nmap fingerprints toward a chosen persona. |
| `decnet/env.py` | Central env-var parsing (`DECNET_DB_TYPE`, `DECNET_EMBED_*`, …). |
| `decnet/collector/` | Syslog / RFC 5424 ingest worker. |
| `decnet/correlation/` | Session and attacker correlation worker. |
| `decnet/profiler/` | Attacker profiler. Embeddable or standalone — see [[Design-Overview]]. |
| `decnet/sniffer/` | Passive PCAP sniffer worker. Same embed/standalone split. |
| `decnet/mutator/` | Runtime mutation of the decoy fleet. |
| `decnet/prober/` | Active probe / realism checker. |
| `decnet/engine/` | Deploy / teardown orchestration. |
| `decnet/web/` | FastAPI app + dashboard + repository layer. |
| `decnet/web/db/` | `SQLModelRepository` base and `sqlite/`, `mysql/` subclasses. See [[Database-Drivers]]. |
| `decnet/logging/` | RFC 5424 emitters and the syslog bridge used by service containers. |
| `templates/<slug>/` | Dockerfile + service config bundle built into the service image. |
| `tests/` | Pytest suite. Mirrors the `decnet/` tree loosely. |
| `development/` | Low-level design notes and generated graphs. Not shipped. |
## Coding conventions
### Lint and static checks
- **ruff** is the single source of truth for style. Config lives in `ruff.toml`. Run `ruff check decnet tests` before committing.
- **bandit** is used for security linting of `decnet/`. Fix findings rather than silencing them; if a silence is unavoidable, scope the `# nosec` comment to one line and explain why.
### Stealth in probes and banners
Never reveal DECNET identity in anything an attacker can see. That means:
- No `User-Agent: DECNET/...` in the prober or in any service plugin.
- No banners, MOTDs, `/etc/issue` contents, HTTP `Server:` headers, or SSH version strings that mention DECNET, honeypot, decoy, fake, or any internal codename.
- No log filenames or env var names leaking into emitted service output.
This rule is load-bearing. A single leaked banner turns the whole fleet into a well-known signature.
### Dependency injection for storage
Do not `from decnet.web.db.sqlite.repository import SQLiteRepository` in new code. Ever.
- **In workers / CLI / library code**: call `get_repository()` from `decnet/web/db/factory.py`. It reads `DECNET_DB_TYPE` and returns the right backend, already wrapped with telemetry.
- **In FastAPI route handlers**: take `repo: BaseRepository = Depends(get_repo)` — defined in `decnet/web/dependencies.py`. This keeps the test harness able to swap in an in-memory repo.
The direct-import rule is enforced by convention and by reviewer. If you find an old direct import while working on a file, fix it in the same commit.
See [[Database-Drivers]] for how SQLite and MySQL subclasses differ.
## Tests
### Layout
- `tests/` — fast unit tests. Run by default.
- `tests/api/` — FastAPI `TestClient` tests.
- `tests/docker/` — integration tests that spin real containers. Opt-in.
- `tests/live/` — full end-to-end against a live deploy. Opt-in.
- `tests/perf/`, `tests/stress/` — performance and soak. Opt-in.
- `tests/service_testing/` — per-service plugin smoke tests.
- `tests/conftest.py` — shared fixtures, including repo factories.
### Running
```bash
pytest -q # fast suite
pytest tests/api -q # just the API
pytest tests/service_testing -q # plugin smoke
pytest -k ssh # single topic
```
### Rules
- Every new feature ships with pytest coverage. No exceptions.
- Never hand off code that is not running or not 100% green. If you cannot finish the tests, say so — do not push.
- Do not use scapy's `sniff()` inside a `TestClient` lifespan test. The sniff thread hangs pytest teardown. Use static source inspection or a fake socket instead.
## Commit style
- Follow the existing log: short imperative subject, `scope:` prefix when obvious (`feat(sniffer):`, `fix(web-ui):`, `test(ssh):`, `chore:`).
- Run the relevant `pytest` subset before committing. A broken main is worse than a late commit.
- Never add `Co-Authored-By:` or any Claude / AI attribution trailer.
- Prefer a new commit over `--amend`. Hooks that fail leave you in a half-state; amending there hides work.
## Related pages
- [[Design-Overview]] — why workers are split out and how embed mode works.
- [[Writing-a-Service-Plugin]] — step-by-step plugin authoring.
- [[Database-Drivers]] — the repository pattern in detail.
- [[Environment-Variables]] — every `DECNET_*` knob.
- [[INI-Config-Format]] — declarative deploy specs.

174
Writing-a-Service-Plugin.md Normal file

@@ -0,0 +1,174 @@
# Writing a Service Plugin
A service plugin is what makes a decky look like an SSH box, an SMB share, an MSSQL server, or whatever else. Plugins are auto-discovered from `decnet/services/`. You add a file, you get a service.
For runtime INI-driven custom services (no Python code at all), see [[Custom-Services]] — this page is for first-class plugins baked into the codebase.
## The contract
Every plugin subclasses `BaseService` from `decnet/services/base.py`:
```python
class BaseService(ABC):
name: str # unique slug, e.g. "ssh"
ports: list[int] # in-container listen ports
default_image: str # Docker image tag, or "build"
fleet_singleton: bool = False # True = one instance fleet-wide
@abstractmethod
def compose_fragment(
self,
decky_name: str,
log_target: str | None = None,
service_cfg: dict | None = None,
) -> dict: ...
def dockerfile_context(self) -> Path | None:
return None
```
Rules the composer enforces so you do not have to:
- Networking keys (`networks`, `ipv4_address`, `mac_address`) are injected by `decnet/composer.py`. Do not set them in `compose_fragment`.
- If you return `"build": {"context": ...}`, make sure `dockerfile_context()` returns the same path so `decnet deploy` can pre-build the image.
- `log_target` is `"ip:port"` when log forwarding is on, else `None`. Pass it into the container as an env var and let the in-container rsyslog bridge handle the rest.
## Registration
There is no registration step. The registry in `decnet/services/registry.py` walks the `decnet/services/` package at import time, imports every module, and picks up every `BaseService` subclass via `__subclasses__()`. Your plugin appears in `decnet services` and in `all_services()` the moment its file exists in the right directory.
To verify:
```bash
decnet services | grep <your-slug>
```
## Templates
If your service needs a custom image (almost all do), drop the build context under `templates/<slug>/`:
```
templates/myservice/
Dockerfile
entrypoint.sh
config/
...
```
Conventions the existing plugins follow:
- Base the image on `debian:bookworm-slim` unless you have a reason to diverge. Heterogeneity is good — some services use Alpine, some use CentOS-derived images.
- Bake an rsyslog or equivalent bridge into the image so the container emits RFC 5424 on stdout.
- Never write DECNET, honeypot, or decoy strings into the image, banners, MOTDs, config files, or user-agents. See the stealth rule in [[Developer-Guide]].
## A minimal plugin
The smallest real plugin is about 50 lines. This one wraps a pre-built image and needs no Dockerfile:
```python
# decnet/services/echoecho.py
from decnet.services.base import BaseService
class EchoEchoService(BaseService):
"""
Tiny TCP echo service. Useful as a template and for testing the composer.
service_cfg keys:
greeting First line sent on connect. Default: empty.
"""
name = "echoecho"
ports = [7]
default_image = "ghcr.io/example/echoecho:1.0"
fleet_singleton = False
def compose_fragment(
self,
decky_name: str,
log_target: str | None = None,
service_cfg: dict | None = None,
) -> dict:
cfg = service_cfg or {}
env: dict = {
"NODE_NAME": decky_name,
"ECHO_GREETING": cfg.get("greeting", ""),
}
if log_target:
env["SYSLOG_TARGET"] = log_target
fragment: dict = {
"image": self.default_image,
"container_name": f"{decky_name}-echoecho",
"restart": "unless-stopped",
"environment": env,
}
return fragment
```
That is the whole plugin. Drop it in `decnet/services/echoecho.py`, run `decnet services`, and it shows up.
## Adding a build context
If you need a custom image, reference `templates/<slug>/` and implement `dockerfile_context`:
```python
from pathlib import Path
from decnet.services.base import BaseService
TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" / "echoecho"
class EchoEchoService(BaseService):
name = "echoecho"
ports = [7]
default_image = "build"
def compose_fragment(self, decky_name, log_target=None, service_cfg=None):
return {
"build": {"context": str(TEMPLATES_DIR)},
"container_name": f"{decky_name}-echoecho",
"restart": "unless-stopped",
"environment": {"NODE_NAME": decky_name},
}
def dockerfile_context(self) -> Path:
return TEMPLATES_DIR
```
Look at `decnet/services/ssh.py` for a fully worked, stealth-aware example including a per-decky quarantine bind-mount.
## Per-service persona config
`service_cfg` is the dict pulled from the matching `[service.<slug>]` section of the INI (see [[INI-Config-Format]]). Keep the keys documented in the class docstring — that docstring is the only user-facing reference.
## Pytest coverage
Every plugin ships with tests. Drop them under `tests/service_testing/test_<slug>.py`. Cover at minimum:
- Instantiation + registry lookup: `all_services()["echoecho"]` resolves.
- `compose_fragment` returns the expected keys for a given `decky_name` and `service_cfg`.
- Absence of DECNET / honeypot strings in rendered env, command, and template files — this is the stealth rule made executable.
- If `dockerfile_context()` is set, that the path exists and contains a `Dockerfile`.
Run `pytest tests/service_testing -q` before committing. Features without tests do not land — see [[Developer-Guide]].
## Checklist
- [ ] New file under `decnet/services/<slug>.py`, subclasses `BaseService`.
- [ ] `name`, `ports`, `default_image` set. `fleet_singleton` if applicable.
- [ ] `compose_fragment` returns networking-free compose dict.
- [ ] If `default_image == "build"`, `dockerfile_context()` returns the context path.
- [ ] `templates/<slug>/` exists with a Dockerfile (if building).
- [ ] No DECNET / honeypot / decoy strings anywhere the attacker can see.
- [ ] `service_cfg` keys documented in the class docstring.
- [ ] Pytest coverage under `tests/service_testing/`.
- [ ] `decnet services` lists the new slug.
- [ ] Commit follows the style in [[Developer-Guide]].
## Related pages
- [[Developer-Guide]] — conventions, DI rules, commit style.
- [[Custom-Services]] — declarative INI-only services.
- [[INI-Config-Format]] — the deploy spec format.
- [[Design-Overview]] — where plugins fit in the larger picture.