diff --git a/Environment-Variables.md b/Environment-Variables.md new file mode 100644 index 0000000..ab34641 --- /dev/null +++ b/Environment-Variables.md @@ -0,0 +1,190 @@ +# Environment Variables + +DECNET reads configuration from process environment. On import, `decnet/env.py` +loads `.env.local` first (preferred, git-ignored) then `.env` from the project +root. Any variable already present in the shell environment wins over both +files. + +Only the variables listed below are recognised. Anything else is noise. + +- Source of truth: [`decnet/env.py`](https://git.resacachile.cl/anti/DECNET/src/branch/main/decnet/env.py) +- Starter template: [`env.config.example`](https://git.resacachile.cl/anti/DECNET/src/branch/main/env.config.example) + +See also: [DB drivers](Database-Drivers), [Logging](Logging-and-Syslog), +[Systemd](Systemd-Setup), [Tracing](Tracing-and-Profiling). + +## Validation rules + +Two validators live in `decnet/env.py`: + +- `_port(name, default)` — integer in `[1, 65535]`. Applies to + `DECNET_API_PORT`, `DECNET_WEB_PORT`, `DECNET_DB_PORT`. +- `_require_env(name)` — variable must be set, and must not be a known-bad + default. Under pytest (`PYTEST*` env var present) the bad-value check is + skipped so test fixtures can use sentinel values. + +### Known-bad-values block list + +`_require_env` rejects these case-insensitive literals: + +- `admin` +- `secret` +- `password` +- `changeme` +- `fallback-secret-key-change-me` + +### JWT secret length rule + +When `name == "DECNET_JWT_SECRET"`, the value must be at least **32 bytes**. +This matches HS256's minimum key length (RFC 7518 §3.2 — "A key of the same +size as the hash output [...] or larger MUST be used"). The check is relaxed +when `DECNET_DEVELOPER=true`. + +## System logging + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_SYSTEM_LOGS` | path | `decnet.system.log` | No | Destination for the RFC 5424 `RotatingFileHandler` installed by `decnet/config.py`. All microservice daemons (api, sniffer, profiler, collector) append here. Skipped under pytest. | + +## Embedded workers + +These are escape hatches — leave them unset in normal deployments. `decnet +deploy` always spawns standalone daemons, and embedding the same worker inside +the API duplicates DB writes and sniffer packets. + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_EMBED_PROFILER` | bool (`true`/other) | `false` | No | Embed profiler in API process. Do not combine with `decnet profiler --daemon`. | +| `DECNET_EMBED_SNIFFER` | bool | `false` | No | Embed MACVLAN sniffer in API process. Do not combine with `decnet sniffer --daemon`. | + +## Request profiling (Pyinstrument) + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_PROFILE_REQUESTS` | bool | `false` | No | Mount Pyinstrument ASGI middleware on the FastAPI app. Writes per-request HTML flamegraphs. | +| `DECNET_PROFILE_DIR` | path | `profiles` | No | Output directory for flamegraphs. Relative paths are relative to `$PWD`. | + +## API server + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_API_HOST` | str | `127.0.0.1` | No | Bind address for the FastAPI server. | +| `DECNET_API_PORT` | int (1–65535) | `8000` | No | TCP port for the API. | +| `DECNET_JWT_SECRET` | str (≥32 chars) | — | **Yes** | HS256 signing secret. Missing, known-bad, or short values abort startup unless `DECNET_DEVELOPER=true` (and even then, known-bad is still rejected). | +| `DECNET_INGEST_LOG_FILE` | path | `/var/log/decnet/decnet.log` | No | File the ingester tails for honeypot events. | + +## Ingester batching + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_BATCH_SIZE` | int | `100` | No | Rows accumulated per DB commit. Larger batches reduce SQLite write-lock contention. | +| `DECNET_BATCH_MAX_WAIT_MS` | int | `250` | No | Maximum milliseconds to wait before flushing a partial batch. Bounds latency during idle periods. | + +## Web dashboard + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_WEB_HOST` | str | `127.0.0.1` | No | Bind address for the web dashboard. | +| `DECNET_WEB_PORT` | int (1–65535) | `8080` | No | Web dashboard port. | +| `DECNET_ADMIN_USER` | str | `admin` | No* | Admin login. `admin` is a known-bad default and is rejected at startup outside pytest. | +| `DECNET_ADMIN_PASSWORD` | str | `admin` | No* | Admin password. Rejected if set to a known-bad value. Change both. | +| `DECNET_DEVELOPER` | bool | `false` | No | `true` enables DEBUG logging and relaxes the JWT length check. Does not enable tracing. | + +*The defaults exist so imports do not crash, but the web API refuses to start +with them in non-pytest environments. + +## Tracing (OpenTelemetry) + +Independent from `DECNET_DEVELOPER` so tracing can be toggled on its own. + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_DEVELOPER_TRACING` | bool | `false` | No | Enable OpenTelemetry tracing for the API and workers. | +| `DECNET_OTEL_ENDPOINT` | URL | `http://localhost:4317` | No | OTLP gRPC collector endpoint. | + +See [Tracing and Profiling](Tracing-and-Profiling). + +## Database + +See [Database Drivers](Database-Drivers) for the full driver matrix. + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_DB_TYPE` | `sqlite` \| `mysql` | `sqlite` | No | Selects the repository subclass. Lower-cased automatically. | +| `DECNET_DB_URL` | SQLAlchemy URL | unset | No | Full URL, e.g. `mysql+asyncmy://user:pass@host:3306/decnet`. **When set, all component vars below are ignored.** | +| `DECNET_DB_HOST` | str | `localhost` | No | MySQL host. | +| `DECNET_DB_PORT` | int (1–65535) | `3306` | No | MySQL port. Validated only when explicitly set. | +| `DECNET_DB_NAME` | str | `decnet` | No | Database name. | +| `DECNET_DB_USER` | str | `decnet` | No | DB user. | +| `DECNET_DB_PASSWORD` | str | unset | No | DB password. `None` when unset. | + +## CORS + +| Name | Type | Default | Required | Consequence | +|------|------|---------|----------|-------------| +| `DECNET_CORS_ORIGINS` | CSV of URLs | `http://:` | No | Allowed origins for the dashboard API. Wildcard bind addresses (`0.0.0.0`, `127.0.0.1`, `::`) resolve to `localhost` in the default. | + +Example override: + +```bash +DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com +``` + +## Starter `.env.local` + +Copy this to the project root as `.env.local`, change every placeholder, and +keep it out of git. + +```bash +# System logging +DECNET_SYSTEM_LOGS=decnet.system.log + +# Embedded workers (leave off unless you know why) +DECNET_EMBED_PROFILER=false +DECNET_EMBED_SNIFFER=false + +# Request profiling +DECNET_PROFILE_REQUESTS=false +DECNET_PROFILE_DIR=profiles + +# API +DECNET_API_HOST=127.0.0.1 +DECNET_API_PORT=8000 +# Generate with: python -c 'import secrets; print(secrets.token_urlsafe(48))' +DECNET_JWT_SECRET=REPLACE_WITH_A_64_BYTE_URLSAFE_TOKEN_NOT_IN_THE_BAD_LIST +DECNET_INGEST_LOG_FILE=/var/log/decnet/decnet.log + +# Ingester batching +DECNET_BATCH_SIZE=100 +DECNET_BATCH_MAX_WAIT_MS=250 + +# Web dashboard +DECNET_WEB_HOST=127.0.0.1 +DECNET_WEB_PORT=8080 +DECNET_ADMIN_USER=anti +DECNET_ADMIN_PASSWORD=REPLACE_ME_WITH_A_LONG_PASSPHRASE +DECNET_DEVELOPER=false + +# Tracing +DECNET_DEVELOPER_TRACING=false +DECNET_OTEL_ENDPOINT=http://localhost:4317 + +# Database (sqlite is the default; uncomment the mysql block to switch) +DECNET_DB_TYPE=sqlite +# DECNET_DB_TYPE=mysql +# DECNET_DB_URL=mysql+asyncmy://decnet:REPLACE_ME@db.internal:3306/decnet +# DECNET_DB_HOST=db.internal +# DECNET_DB_PORT=3306 +# DECNET_DB_NAME=decnet +# DECNET_DB_USER=decnet +# DECNET_DB_PASSWORD=REPLACE_ME + +# CORS (only needed when the browser is not on the same host:port as the API) +# DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com +``` + +## Notes + +`decnet/config.py` re-reads `DECNET_DEVELOPER` and `DECNET_SYSTEM_LOGS` during +logging setup. Those are the same variables documented above — there are no +others. diff --git a/Teardown-and-State.md b/Teardown-and-State.md new file mode 100644 index 0000000..9fbef23 --- /dev/null +++ b/Teardown-and-State.md @@ -0,0 +1,177 @@ +# Teardown and State + +DECNET keeps the whole fleet picture in a single file, `decnet-state.json`, +at the project root. Every command that touches a running deployment +(`decnet status`, `decnet teardown`, the web dashboard, the sniffer, the +collector) loads it; `decnet deploy` writes it. + +Without this file, teardown cannot find the compose project, the sniffer +cannot map IPs to deckies, and the collector does not know which containers +to tail. + +See also: [Environment Variables](Environment-Variables), +[Database Drivers](Database-Drivers), [Systemd](Systemd-Setup). + +## Layout + +`decnet-state.json` has exactly two top-level keys: + +```json +{ + "config": { ... DecnetConfig.model_dump() ... }, + "compose_path": "/absolute/path/to/decnet-compose.yml" +} +``` + +- `config` — the serialised `DecnetConfig` pydantic model + (`decnet/models.py`): `mode`, `interface`, `subnet`, `gateway`, `ipvlan`, + `mutate_interval`, `log_file`, and the full `deckies[]` list. Each + `DeckyConfig` entry carries name, IP, services, distro, base image, + hostname, archetype, per-service config, `nmap_os`, and rotation timestamps. +- `compose_path` — absolute path to the generated + `decnet-compose.yml`. Teardown uses it as the `-f` argument to + `docker compose`. + +### Example `decnet-state.json` + +```json +{ + "config": { + "mode": "unihost", + "interface": "eth0", + "subnet": "192.168.1.0/24", + "gateway": "192.168.1.1", + "ipvlan": false, + "mutate_interval": 30, + "log_file": "/var/log/decnet/decnet.log", + "deckies": [ + { + "name": "decky-01", + "ip": "192.168.1.201", + "services": ["ssh", "smb"], + "distro": "debian", + "base_image": "debian:bookworm-slim", + "build_base": "debian:bookworm-slim", + "hostname": "fileserver-02", + "archetype": "office-fileshare", + "service_config": {}, + "nmap_os": "linux", + "mutate_interval": null, + "last_mutated": 0.0, + "last_login_attempt": 0.0 + }, + { + "name": "decky-02", + "ip": "192.168.1.202", + "services": ["rdp"], + "distro": "ubuntu22", + "base_image": "ubuntu:22.04", + "build_base": "debian:bookworm-slim", + "hostname": "WIN-DESK01", + "archetype": null, + "service_config": {}, + "nmap_os": "windows", + "mutate_interval": null, + "last_mutated": 0.0, + "last_login_attempt": 0.0 + } + ] + }, + "compose_path": "/home/anti/Tools/DECNET/decnet-compose.yml" +} +``` + +## API + +All three helpers live in `decnet/config.py`: + +### `save_state(config: DecnetConfig, compose_path: Path) -> None` + +Dumps `{"config": config.model_dump(), "compose_path": str(compose_path)}` +as pretty-printed JSON (`indent=2`) to `STATE_FILE` +(`/decnet-state.json`). Overwrites any existing file. + +Called by `decnet/engine/deployer.py::deploy` after the compose file is +written and before `docker compose up`. + +### `load_state() -> tuple[DecnetConfig, Path] | None` + +Returns `None` when the file does not exist. Otherwise parses the JSON, +re-hydrates `DecnetConfig`, and returns `(config, Path(compose_path))`. + +Callers: + +- `decnet/engine/deployer.py` — `teardown()` and `status()`. +- `decnet/sniffer/worker.py` — builds the IP-to-decky-name map. +- `decnet/collector/worker.py` — resolves the exact set of service container + names to tail. Wrapped in `asyncio.to_thread()` to keep the event loop clean. +- `decnet/web/db/sqlmodel_repo.py` — uses `asyncio.to_thread(load_state)` to + surface deployment metadata through the dashboard API. + +### `clear_state() -> None` + +`unlink()`s the state file if present. A no-op otherwise. Called once by +`teardown()` after `docker compose down` and host-interface cleanup succeed. + +## How teardown cleans host interfaces + +`decnet/engine/deployer.py::teardown(decky_id=None)` runs, in order: + +1. `load_state()`. If it returns `None`, prints + `No active deployment found (no decnet-state.json).` and exits. +2. If `decky_id` is given, `docker compose stop -...` then + `docker compose rm -f ...` for that decky only. **No host-interface + cleanup and no state clear** — the rest of the fleet is still alive. +3. If no `decky_id` (full teardown): + 1. `docker compose down` with the `compose_path` from state. + 2. Compute the decky IP range with `ips_to_range([d.ip for d in config.deckies])`. + 3. Remove the host-side L2 interface: + - `teardown_host_ipvlan(decky_range)` when `config.ipvlan` is `true`, or + - `teardown_host_macvlan(decky_range)` otherwise. + 4. `remove_macvlan_network(client)` drops the docker network. + 5. `clear_state()` deletes `decnet-state.json`. + 6. Logs `teardown complete` and prints the driver that was removed. + +If step 3 never runs (you ctrl-C'd, or one of the subprocess calls errored), +`decnet-state.json` stays on disk and so do the host interfaces. Re-running +`sudo decnet teardown --all` is idempotent and safe. + +## When you need `sudo` + +Anything that touches host networking — creating or removing a MACVLAN / +IPvlan parent interface, opening a raw socket for the sniffer — needs +`CAP_NET_ADMIN`, which in practice means `sudo`: + +- `sudo decnet deploy ...` — creates the host interface, writes + `decnet-state.json`, brings up the compose project. +- `sudo decnet teardown` / `sudo decnet teardown --all` — removes host + interfaces, clears state. Without `sudo` the ip-link calls fail and the + state file is left behind. +- `sudo decnet teardown --id decky-01` — still needs `sudo` if the compose + project was created by root. +- `sudo decnet sniffer --daemon` — raw packet capture on the parent iface. + +Read-only commands that only consult `decnet-state.json` and the dashboard +DB do not need root: + +- `decnet status` +- `decnet services` +- `decnet deploy --dry-run` (generates the compose file only) +- `decnet api` / `decnet web` once the deployment is up — as long as the + state file and `DECNET_SYSTEM_LOGS` are readable by the invoking user. + `decnet/config.py` drops root ownership of the system log when invoked via + `sudo` precisely so the follow-up non-root commands can append to it. + +## Troubleshooting + +- **`No active deployment found`** — `decnet-state.json` is missing. + Either the deploy never completed, or a previous teardown already ran. +- **Orphan host interfaces after a crash** — re-run + `sudo decnet teardown --all`. If state is gone, remove them manually with + `ip link del decnet-mv0` (or the ipvlan equivalent) and delete the docker + network. +- **`PermissionError` writing the state file** — you ran `decnet deploy` + without `sudo` on a fresh checkout; the project root is not writable by + the current user. Either `chmod` the directory or run as root. +- **Stale `compose_path`** — moving the project directory after deploy + breaks teardown. Tear down first, move, redeploy.