Add Environment Variables and Teardown and State wiki pages

2026-04-18 06:07:22 -04:00
parent e48b884970
commit 68bbd2e7e0
2 changed files with 367 additions and 0 deletions

190
Environment-Variables.md Normal file

@@ -0,0 +1,190 @@
# Environment Variables
DECNET reads configuration from process environment. On import, `decnet/env.py`
loads `.env.local` first (preferred, git-ignored) then `.env` from the project
root. Any variable already present in the shell environment wins over both
files.
Only the variables listed below are recognised. Anything else is noise.
- Source of truth: [`decnet/env.py`](https://git.resacachile.cl/anti/DECNET/src/branch/main/decnet/env.py)
- Starter template: [`env.config.example`](https://git.resacachile.cl/anti/DECNET/src/branch/main/env.config.example)
See also: [DB drivers](Database-Drivers), [Logging](Logging-and-Syslog),
[Systemd](Systemd-Setup), [Tracing](Tracing-and-Profiling).
## Validation rules
Two validators live in `decnet/env.py`:
- `_port(name, default)` — integer in `[1, 65535]`. Applies to
`DECNET_API_PORT`, `DECNET_WEB_PORT`, `DECNET_DB_PORT`.
- `_require_env(name)` — variable must be set, and must not be a known-bad
default. Under pytest (`PYTEST*` env var present) the bad-value check is
skipped so test fixtures can use sentinel values.
### Known-bad-values block list
`_require_env` rejects these case-insensitive literals:
- `admin`
- `secret`
- `password`
- `changeme`
- `fallback-secret-key-change-me`
### JWT secret length rule
When `name == "DECNET_JWT_SECRET"`, the value must be at least **32 bytes**.
This matches HS256's minimum key length (RFC 7518 §3.2 — "A key of the same
size as the hash output [...] or larger MUST be used"). The check is relaxed
when `DECNET_DEVELOPER=true`.
## System logging
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_SYSTEM_LOGS` | path | `decnet.system.log` | No | Destination for the RFC 5424 `RotatingFileHandler` installed by `decnet/config.py`. All microservice daemons (api, sniffer, profiler, collector) append here. Skipped under pytest. |
## Embedded workers
These are escape hatches — leave them unset in normal deployments. `decnet
deploy` always spawns standalone daemons, and embedding the same worker inside
the API duplicates DB writes and sniffer packets.
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_EMBED_PROFILER` | bool (`true`/other) | `false` | No | Embed profiler in API process. Do not combine with `decnet profiler --daemon`. |
| `DECNET_EMBED_SNIFFER` | bool | `false` | No | Embed MACVLAN sniffer in API process. Do not combine with `decnet sniffer --daemon`. |
## Request profiling (Pyinstrument)
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_PROFILE_REQUESTS` | bool | `false` | No | Mount Pyinstrument ASGI middleware on the FastAPI app. Writes per-request HTML flamegraphs. |
| `DECNET_PROFILE_DIR` | path | `profiles` | No | Output directory for flamegraphs. Relative paths are relative to `$PWD`. |
## API server
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_API_HOST` | str | `127.0.0.1` | No | Bind address for the FastAPI server. |
| `DECNET_API_PORT` | int (165535) | `8000` | No | TCP port for the API. |
| `DECNET_JWT_SECRET` | str (≥32 chars) | — | **Yes** | HS256 signing secret. Missing, known-bad, or short values abort startup unless `DECNET_DEVELOPER=true` (and even then, known-bad is still rejected). |
| `DECNET_INGEST_LOG_FILE` | path | `/var/log/decnet/decnet.log` | No | File the ingester tails for honeypot events. |
## Ingester batching
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_BATCH_SIZE` | int | `100` | No | Rows accumulated per DB commit. Larger batches reduce SQLite write-lock contention. |
| `DECNET_BATCH_MAX_WAIT_MS` | int | `250` | No | Maximum milliseconds to wait before flushing a partial batch. Bounds latency during idle periods. |
## Web dashboard
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_WEB_HOST` | str | `127.0.0.1` | No | Bind address for the web dashboard. |
| `DECNET_WEB_PORT` | int (165535) | `8080` | No | Web dashboard port. |
| `DECNET_ADMIN_USER` | str | `admin` | No* | Admin login. `admin` is a known-bad default and is rejected at startup outside pytest. |
| `DECNET_ADMIN_PASSWORD` | str | `admin` | No* | Admin password. Rejected if set to a known-bad value. Change both. |
| `DECNET_DEVELOPER` | bool | `false` | No | `true` enables DEBUG logging and relaxes the JWT length check. Does not enable tracing. |
*The defaults exist so imports do not crash, but the web API refuses to start
with them in non-pytest environments.
## Tracing (OpenTelemetry)
Independent from `DECNET_DEVELOPER` so tracing can be toggled on its own.
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_DEVELOPER_TRACING` | bool | `false` | No | Enable OpenTelemetry tracing for the API and workers. |
| `DECNET_OTEL_ENDPOINT` | URL | `http://localhost:4317` | No | OTLP gRPC collector endpoint. |
See [Tracing and Profiling](Tracing-and-Profiling).
## Database
See [Database Drivers](Database-Drivers) for the full driver matrix.
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_DB_TYPE` | `sqlite` \| `mysql` | `sqlite` | No | Selects the repository subclass. Lower-cased automatically. |
| `DECNET_DB_URL` | SQLAlchemy URL | unset | No | Full URL, e.g. `mysql+asyncmy://user:pass@host:3306/decnet`. **When set, all component vars below are ignored.** |
| `DECNET_DB_HOST` | str | `localhost` | No | MySQL host. |
| `DECNET_DB_PORT` | int (165535) | `3306` | No | MySQL port. Validated only when explicitly set. |
| `DECNET_DB_NAME` | str | `decnet` | No | Database name. |
| `DECNET_DB_USER` | str | `decnet` | No | DB user. |
| `DECNET_DB_PASSWORD` | str | unset | No | DB password. `None` when unset. |
## CORS
| Name | Type | Default | Required | Consequence |
|------|------|---------|----------|-------------|
| `DECNET_CORS_ORIGINS` | CSV of URLs | `http://<web_host>:<web_port>` | No | Allowed origins for the dashboard API. Wildcard bind addresses (`0.0.0.0`, `127.0.0.1`, `::`) resolve to `localhost` in the default. |
Example override:
```bash
DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com
```
## Starter `.env.local`
Copy this to the project root as `.env.local`, change every placeholder, and
keep it out of git.
```bash
# System logging
DECNET_SYSTEM_LOGS=decnet.system.log
# Embedded workers (leave off unless you know why)
DECNET_EMBED_PROFILER=false
DECNET_EMBED_SNIFFER=false
# Request profiling
DECNET_PROFILE_REQUESTS=false
DECNET_PROFILE_DIR=profiles
# API
DECNET_API_HOST=127.0.0.1
DECNET_API_PORT=8000
# Generate with: python -c 'import secrets; print(secrets.token_urlsafe(48))'
DECNET_JWT_SECRET=REPLACE_WITH_A_64_BYTE_URLSAFE_TOKEN_NOT_IN_THE_BAD_LIST
DECNET_INGEST_LOG_FILE=/var/log/decnet/decnet.log
# Ingester batching
DECNET_BATCH_SIZE=100
DECNET_BATCH_MAX_WAIT_MS=250
# Web dashboard
DECNET_WEB_HOST=127.0.0.1
DECNET_WEB_PORT=8080
DECNET_ADMIN_USER=anti
DECNET_ADMIN_PASSWORD=REPLACE_ME_WITH_A_LONG_PASSPHRASE
DECNET_DEVELOPER=false
# Tracing
DECNET_DEVELOPER_TRACING=false
DECNET_OTEL_ENDPOINT=http://localhost:4317
# Database (sqlite is the default; uncomment the mysql block to switch)
DECNET_DB_TYPE=sqlite
# DECNET_DB_TYPE=mysql
# DECNET_DB_URL=mysql+asyncmy://decnet:REPLACE_ME@db.internal:3306/decnet
# DECNET_DB_HOST=db.internal
# DECNET_DB_PORT=3306
# DECNET_DB_NAME=decnet
# DECNET_DB_USER=decnet
# DECNET_DB_PASSWORD=REPLACE_ME
# CORS (only needed when the browser is not on the same host:port as the API)
# DECNET_CORS_ORIGINS=http://192.168.1.50:9090,https://dashboard.example.com
```
## Notes
`decnet/config.py` re-reads `DECNET_DEVELOPER` and `DECNET_SYSTEM_LOGS` during
logging setup. Those are the same variables documented above — there are no
others.

177
Teardown-and-State.md Normal file

@@ -0,0 +1,177 @@
# Teardown and State
DECNET keeps the whole fleet picture in a single file, `decnet-state.json`,
at the project root. Every command that touches a running deployment
(`decnet status`, `decnet teardown`, the web dashboard, the sniffer, the
collector) loads it; `decnet deploy` writes it.
Without this file, teardown cannot find the compose project, the sniffer
cannot map IPs to deckies, and the collector does not know which containers
to tail.
See also: [Environment Variables](Environment-Variables),
[Database Drivers](Database-Drivers), [Systemd](Systemd-Setup).
## Layout
`decnet-state.json` has exactly two top-level keys:
```json
{
"config": { ... DecnetConfig.model_dump() ... },
"compose_path": "/absolute/path/to/decnet-compose.yml"
}
```
- `config` — the serialised `DecnetConfig` pydantic model
(`decnet/models.py`): `mode`, `interface`, `subnet`, `gateway`, `ipvlan`,
`mutate_interval`, `log_file`, and the full `deckies[]` list. Each
`DeckyConfig` entry carries name, IP, services, distro, base image,
hostname, archetype, per-service config, `nmap_os`, and rotation timestamps.
- `compose_path` — absolute path to the generated
`decnet-compose.yml`. Teardown uses it as the `-f` argument to
`docker compose`.
### Example `decnet-state.json`
```json
{
"config": {
"mode": "unihost",
"interface": "eth0",
"subnet": "192.168.1.0/24",
"gateway": "192.168.1.1",
"ipvlan": false,
"mutate_interval": 30,
"log_file": "/var/log/decnet/decnet.log",
"deckies": [
{
"name": "decky-01",
"ip": "192.168.1.201",
"services": ["ssh", "smb"],
"distro": "debian",
"base_image": "debian:bookworm-slim",
"build_base": "debian:bookworm-slim",
"hostname": "fileserver-02",
"archetype": "office-fileshare",
"service_config": {},
"nmap_os": "linux",
"mutate_interval": null,
"last_mutated": 0.0,
"last_login_attempt": 0.0
},
{
"name": "decky-02",
"ip": "192.168.1.202",
"services": ["rdp"],
"distro": "ubuntu22",
"base_image": "ubuntu:22.04",
"build_base": "debian:bookworm-slim",
"hostname": "WIN-DESK01",
"archetype": null,
"service_config": {},
"nmap_os": "windows",
"mutate_interval": null,
"last_mutated": 0.0,
"last_login_attempt": 0.0
}
]
},
"compose_path": "/home/anti/Tools/DECNET/decnet-compose.yml"
}
```
## API
All three helpers live in `decnet/config.py`:
### `save_state(config: DecnetConfig, compose_path: Path) -> None`
Dumps `{"config": config.model_dump(), "compose_path": str(compose_path)}`
as pretty-printed JSON (`indent=2`) to `STATE_FILE`
(`<project root>/decnet-state.json`). Overwrites any existing file.
Called by `decnet/engine/deployer.py::deploy` after the compose file is
written and before `docker compose up`.
### `load_state() -> tuple[DecnetConfig, Path] | None`
Returns `None` when the file does not exist. Otherwise parses the JSON,
re-hydrates `DecnetConfig`, and returns `(config, Path(compose_path))`.
Callers:
- `decnet/engine/deployer.py``teardown()` and `status()`.
- `decnet/sniffer/worker.py` — builds the IP-to-decky-name map.
- `decnet/collector/worker.py` — resolves the exact set of service container
names to tail. Wrapped in `asyncio.to_thread()` to keep the event loop clean.
- `decnet/web/db/sqlmodel_repo.py` — uses `asyncio.to_thread(load_state)` to
surface deployment metadata through the dashboard API.
### `clear_state() -> None`
`unlink()`s the state file if present. A no-op otherwise. Called once by
`teardown()` after `docker compose down` and host-interface cleanup succeed.
## How teardown cleans host interfaces
`decnet/engine/deployer.py::teardown(decky_id=None)` runs, in order:
1. `load_state()`. If it returns `None`, prints
`No active deployment found (no decnet-state.json).` and exits.
2. If `decky_id` is given, `docker compose stop <decky>-<svc>...` then
`docker compose rm -f ...` for that decky only. **No host-interface
cleanup and no state clear** — the rest of the fleet is still alive.
3. If no `decky_id` (full teardown):
1. `docker compose down` with the `compose_path` from state.
2. Compute the decky IP range with `ips_to_range([d.ip for d in config.deckies])`.
3. Remove the host-side L2 interface:
- `teardown_host_ipvlan(decky_range)` when `config.ipvlan` is `true`, or
- `teardown_host_macvlan(decky_range)` otherwise.
4. `remove_macvlan_network(client)` drops the docker network.
5. `clear_state()` deletes `decnet-state.json`.
6. Logs `teardown complete` and prints the driver that was removed.
If step 3 never runs (you ctrl-C'd, or one of the subprocess calls errored),
`decnet-state.json` stays on disk and so do the host interfaces. Re-running
`sudo decnet teardown --all` is idempotent and safe.
## When you need `sudo`
Anything that touches host networking — creating or removing a MACVLAN /
IPvlan parent interface, opening a raw socket for the sniffer — needs
`CAP_NET_ADMIN`, which in practice means `sudo`:
- `sudo decnet deploy ...` — creates the host interface, writes
`decnet-state.json`, brings up the compose project.
- `sudo decnet teardown` / `sudo decnet teardown --all` — removes host
interfaces, clears state. Without `sudo` the ip-link calls fail and the
state file is left behind.
- `sudo decnet teardown --id decky-01` — still needs `sudo` if the compose
project was created by root.
- `sudo decnet sniffer --daemon` — raw packet capture on the parent iface.
Read-only commands that only consult `decnet-state.json` and the dashboard
DB do not need root:
- `decnet status`
- `decnet services`
- `decnet deploy --dry-run` (generates the compose file only)
- `decnet api` / `decnet web` once the deployment is up — as long as the
state file and `DECNET_SYSTEM_LOGS` are readable by the invoking user.
`decnet/config.py` drops root ownership of the system log when invoked via
`sudo` precisely so the follow-up non-root commands can append to it.
## Troubleshooting
- **`No active deployment found`** — `decnet-state.json` is missing.
Either the deploy never completed, or a previous teardown already ran.
- **Orphan host interfaces after a crash** — re-run
`sudo decnet teardown --all`. If state is gone, remove them manually with
`ip link del decnet-mv0` (or the ipvlan equivalent) and delete the docker
network.
- **`PermissionError` writing the state file** — you ran `decnet deploy`
without `sudo` on a fresh checkout; the project root is not writable by
the current user. Either `chmod` the directory or run as root.
- **Stale `compose_path`** — moving the project directory after deploy
breaks teardown. Tear down first, move, redeploy.