docs(mazenet): add MazeNET wiki page + sidebar entry
373
MazeNET.md
Normal file
373
MazeNET.md
Normal file
@@ -0,0 +1,373 @@
|
||||
# MazeNET — Nested Network of Networks
|
||||
|
||||
MazeNET is DECNET's recursive deception topology. Instead of a flat fleet
|
||||
of deckies on one LAN, MazeNET produces a **DAG of segmented LANs** with
|
||||
multi-homed "bridge deckies" forwarding L3 between them — a DMZ at the
|
||||
edge, internal segments behind it, and, with cross-edges enabled, pivot
|
||||
paths a patient attacker can chase deeper into the maze.
|
||||
|
||||
A flat deployment burns an attacker for minutes. A nested topology burns
|
||||
them for hours.
|
||||
|
||||
See also: [CLI reference](CLI-Reference),
|
||||
[Deployment modes](Deployment-Modes),
|
||||
[Networking: MACVLAN and IPVLAN](Networking-MACVLAN-IPVLAN),
|
||||
[Teardown](Teardown-and-State),
|
||||
[Logging and syslog](Logging-and-Syslog).
|
||||
|
||||
---
|
||||
|
||||
## When to use MazeNET
|
||||
|
||||
Use MazeNET when:
|
||||
|
||||
- You are deploying on a VPS or a single dedicated box and want the
|
||||
appearance of a **segmented internal network** (DMZ → services →
|
||||
internal) behind the public IP.
|
||||
- You want attackers to **pivot** — discover one decky, enumerate it,
|
||||
find a foothold into a deeper LAN, repeat.
|
||||
- You want per-LAN isolation so a compromise of one decky can't reach
|
||||
sibling segments without going through a bridge you control.
|
||||
|
||||
Stick with flat UNIHOST mode (see [Deployment
|
||||
Modes](Deployment-Modes)) when:
|
||||
|
||||
- You only need a handful of deckies on a shared LAN.
|
||||
- You want LAN-peer realism (deckies reachable from your existing
|
||||
workstations over MACVLAN/IPVLAN). MazeNET uses plain Docker bridge
|
||||
networks — it does not hand IPs out onto your real LAN.
|
||||
|
||||
---
|
||||
|
||||
## Concepts
|
||||
|
||||
### LAN
|
||||
|
||||
A MazeNET LAN is one plain Docker bridge network. Each LAN has a
|
||||
`/24` subnet carved from the configured base prefix (default
|
||||
`172.20.X.0/24`, one LAN per octet). LANs are arranged as a tree of
|
||||
configurable depth and branching factor.
|
||||
|
||||
- **LAN-00** is always the DMZ root, `is_dmz=True`, publicly routable
|
||||
via the host's default bridge egress.
|
||||
- Every other LAN is created with Docker's `--internal` flag — no
|
||||
host-level default egress. The only way out is through a bridge
|
||||
decky.
|
||||
|
||||
### Decky
|
||||
|
||||
Same concept as flat mode: a "base" container holds the LAN IPs and
|
||||
any service containers share its network namespace via
|
||||
`network_mode: service:<base>`. One base, N service containers.
|
||||
|
||||
In MazeNET a decky is identified by a UUID (table `topology_deckies`)
|
||||
and is scoped to one topology. Names are unique **within** a topology;
|
||||
two different topologies can both have a `decky-001`.
|
||||
|
||||
### Bridge decky
|
||||
|
||||
A decky multi-homed onto ≥2 LANs. Every non-DMZ LAN has **exactly one
|
||||
parent bridge** — a decky on that LAN that's also given an IP in the
|
||||
parent LAN. That's how packets leave a segment.
|
||||
|
||||
If `bridge_forward_probability` rolls true for that bridge, the base
|
||||
container gets `net.ipv4.ip_forward=1` (compose-level sysctl) and
|
||||
`NET_ADMIN`, turning the bridge into an actual router. If it rolls
|
||||
false, the bridge is multi-homed but will not forward — the attacker
|
||||
must find a forwarder or own the bridge container itself.
|
||||
|
||||
### Cross-edges — tree vs DAG
|
||||
|
||||
With `cross_edge_probability=0` (default) the topology is a pure tree:
|
||||
each non-DMZ LAN has exactly one parent bridge and no other inter-LAN
|
||||
connections.
|
||||
|
||||
With `cross_edge_probability > 0` the generator rolls per LAN; on a
|
||||
hit it multi-homes a random decky to a non-parent, non-child, non-self
|
||||
peer LAN. This is where the DAG comes from. The data model and the
|
||||
teardown path have supported DAGs from day one — cross-edges just
|
||||
exercise the code that's already there.
|
||||
|
||||
### Determinism
|
||||
|
||||
The generator is seeded (`TopologyConfig.seed`). Same seed + same
|
||||
config ⇒ bit-identical LAN layout, decky names, service assignments,
|
||||
and edges. Persistence stores the full config snapshot so you can
|
||||
regenerate or audit exactly what was deployed.
|
||||
|
||||
---
|
||||
|
||||
## Status lifecycle
|
||||
|
||||
Every topology carries a `status` column with this state machine:
|
||||
|
||||
```
|
||||
pending ──► deploying ──► active ──► tearing_down ──► torn_down
|
||||
│ │ │ ▲
|
||||
│ ├──► failed ─┘ │
|
||||
│ │ │
|
||||
│ └──► degraded ◄──► active │
|
||||
│ │ │
|
||||
▼ ├──► tearing_down ────────────────┘
|
||||
torn_down │
|
||||
└──► tearing_down ────────────────┘
|
||||
```
|
||||
|
||||
- `pending` — persisted plan, no Docker state yet.
|
||||
- `deploying` — bridge networks being created, compose coming up.
|
||||
- `active` — healthy and serving.
|
||||
- `failed` — deploy aborted; partial state may remain on the daemon.
|
||||
Legal successor: `tearing_down`.
|
||||
- `degraded` — **schema-reserved** for the future Healer. No v1 code
|
||||
path reaches it. Treat it as read-only.
|
||||
- `tearing_down` — compose down + network removal in progress.
|
||||
- `torn_down` — terminal. No legal successor.
|
||||
|
||||
Every transition writes a row to `topology_status_events`
|
||||
(from/to/when/reason) — an audit log you can query later.
|
||||
|
||||
Illegal transitions raise `TopologyStatusError` from
|
||||
`decnet.topology.status.assert_transition`. There is no `force`
|
||||
escape hatch; transitions are enforced everywhere.
|
||||
|
||||
---
|
||||
|
||||
## Schema
|
||||
|
||||
Five new SQLModel tables live in `decnet/web/db/models.py`. They
|
||||
coexist with `DeckyShard` (SWARM mode); flat/SWARM deployments do not
|
||||
touch MazeNET tables and vice versa.
|
||||
|
||||
| Table | Purpose |
|
||||
|---|---|
|
||||
| `topologies` | One row per topology. Carries `status`, `config_snapshot` (the full `TopologyConfig` including `seed`), `created_at`, `status_changed_at`. |
|
||||
| `lans` | One row per LAN. `subnet`, `is_dmz`, `(topology_id, name)` unique, `docker_network_id` populated at deploy. |
|
||||
| `topology_deckies` | One row per decky. UUID PK, `decky_config` blob holds `ips_by_lan` + `forwards_l3`, `(topology_id, name)` unique. |
|
||||
| `topology_edges` | `(decky_uuid, lan_id)` membership. `is_bridge=True` iff the decky appears on ≥2 LANs; `forwards_l3` flag mirrored from the decky. |
|
||||
| `topology_status_events` | Audit log — one row per status transition with `reason` text. |
|
||||
|
||||
Repository methods land on the shared `SQLModelRepository` base, so
|
||||
both SQLite and MySQL backends get them for free. Never import a
|
||||
backend directly; use `get_repository()` (see [Database
|
||||
Drivers](Database-Drivers)).
|
||||
|
||||
---
|
||||
|
||||
## CLI walkthrough
|
||||
|
||||
MazeNET commands live under `decnet topology`. The group is
|
||||
**master-only** — hidden on agents via `MASTER_ONLY_GROUPS`.
|
||||
|
||||
### 1. Generate a plan
|
||||
|
||||
```bash
|
||||
decnet topology generate \
|
||||
--name corp-decoy \
|
||||
--depth 3 \
|
||||
--branching 2 \
|
||||
--deckies-per-lan 1-3 \
|
||||
--cross-edge-p 0.15 \
|
||||
--seed 42
|
||||
```
|
||||
|
||||
Writes a new `topologies` row in `pending` status and all the LAN /
|
||||
decky / edge children. No Docker calls, no containers. Prints:
|
||||
|
||||
```
|
||||
Topology persisted as pending — id=9b1e...
|
||||
LANs: 8 deckies: 14 edges: 16
|
||||
```
|
||||
|
||||
Flags:
|
||||
|
||||
```
|
||||
--name <str> Topology label. Required.
|
||||
--depth <1..16> Max tree depth from the DMZ.
|
||||
--branching <1..8> Max child LANs per non-leaf LAN.
|
||||
--deckies-per-lan MIN-MAX Range per LAN, e.g. 1-3.
|
||||
--bridge-forward-p 0..1 P(bridge forwards L3). default: 1.0
|
||||
--cross-edge-p 0..1 P(non-DMZ LAN adds a DAG cross-edge). default: 0.0
|
||||
--services a,b,c Fixed service set (bypasses --randomize-services).
|
||||
--randomize-services Default: true. Pick 1–3 random services per decky.
|
||||
--seed <int> Deterministic RNG. Same seed ⇒ same topology.
|
||||
```
|
||||
|
||||
### 2. List
|
||||
|
||||
```bash
|
||||
decnet topology list
|
||||
```
|
||||
|
||||
Table of id, name, mode, status, created_at for every persisted
|
||||
topology. Empty when there are none.
|
||||
|
||||
### 3. Show
|
||||
|
||||
```bash
|
||||
decnet topology show 9b1e1234-5678-...
|
||||
```
|
||||
|
||||
Structured text rendering — LAN-by-LAN, each LAN's deckies with IP,
|
||||
services, and `(bridge, L3-forward)` tags where applicable. No ASCII
|
||||
art; visual DAG rendering belongs in the web dashboard (see
|
||||
[Web-Dashboard](Web-Dashboard)).
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
corp-decoy id=9b1e1234-... status=pending mode=unihost
|
||||
|
||||
LAN LAN-00 172.20.0.0/24 (DMZ)
|
||||
• decky-001 172.20.0.2 svcs=ssh,http
|
||||
|
||||
LAN LAN-01 172.20.1.0/24
|
||||
• decky-002 172.20.1.2 svcs=smb (bridge, L3-forward)
|
||||
• decky-003 172.20.1.3 svcs=ftp
|
||||
|
||||
LAN LAN-02 172.20.2.0/24
|
||||
• decky-002 172.20.2.2 svcs=smb (bridge, L3-forward)
|
||||
• decky-004 172.20.2.2 svcs=rdp
|
||||
...
|
||||
```
|
||||
|
||||
### 4. Deploy
|
||||
|
||||
```bash
|
||||
sudo decnet topology deploy 9b1e1234-5678-...
|
||||
```
|
||||
|
||||
Runs the engine deployer. For a `pending` topology:
|
||||
|
||||
1. Transition to `deploying`.
|
||||
2. Create one plain Docker bridge network per LAN
|
||||
(`decnet_t_<tidprefix>_lan-NN`). DMZ LAN is regular; internal LANs
|
||||
are created with `--internal`.
|
||||
3. Write a per-topology compose file (`decnet-topology-<tid>-compose.yml`).
|
||||
Each decky's base lists every LAN it's on with a per-LAN
|
||||
`ipv4_address`. Bridge deckies with `forwards_l3=True` get
|
||||
`sysctls: {net.ipv4.ip_forward: 1}` + `cap_add: [NET_ADMIN]`.
|
||||
4. `docker compose up --build -d` (with retry on transient errors).
|
||||
5. Transition to `active`.
|
||||
|
||||
On exception the topology is transitioned to `failed` with the error
|
||||
text in the status event's `reason`. Partial Docker state is left in
|
||||
place so you can tear it down cleanly.
|
||||
|
||||
Dry-run mode writes the compose file and exits without touching
|
||||
Docker or the topology's status:
|
||||
|
||||
```bash
|
||||
decnet topology deploy <id> --dry-run
|
||||
```
|
||||
|
||||
Use `--dry-run` to diff the compose output against a previous deploy
|
||||
or to sanity-check the plan before committing networks.
|
||||
|
||||
### 5. Teardown
|
||||
|
||||
```bash
|
||||
sudo decnet topology teardown 9b1e1234-5678-...
|
||||
```
|
||||
|
||||
Legal from any of `active`, `degraded`, `failed`, or `deploying`. Runs:
|
||||
|
||||
1. Transition to `tearing_down`.
|
||||
2. `docker compose down --remove-orphans` (best effort — continues on
|
||||
failure so a half-deployed topology can still be cleaned).
|
||||
3. Remove each LAN's Docker bridge network in **leaf-first** order
|
||||
(LAN names are BFS-numbered, so reverse-name order is a valid
|
||||
topological sort).
|
||||
4. Delete the per-topology compose file.
|
||||
5. Transition to `torn_down`.
|
||||
|
||||
`torn_down` is terminal. The repo row is kept for audit; to purge it
|
||||
outright, call `repo.delete_topology_cascade(topology_id)` from code
|
||||
(no CLI wrapper by design — deletes are destructive).
|
||||
|
||||
---
|
||||
|
||||
## What a deployed topology looks like on the host
|
||||
|
||||
```bash
|
||||
# One bridge network per LAN, all prefixed decnet_t_<tid_prefix>_.
|
||||
docker network ls --filter name=decnet_t_
|
||||
|
||||
# Every decky base + its services as containers. Base containers are
|
||||
# named decnet_t_<tid_prefix>_decky-NNN; services share the base's
|
||||
# netns.
|
||||
docker ps --filter name=decnet_t_
|
||||
|
||||
# Inside a bridge decky's base, two interfaces (one per LAN).
|
||||
docker exec decnet_t_abcd1234_decky-002 ip -br addr
|
||||
|
||||
# ip_forward enabled on L3 forwarders.
|
||||
docker exec decnet_t_abcd1234_decky-002 sysctl net.ipv4.ip_forward
|
||||
# net.ipv4.ip_forward = 1
|
||||
|
||||
# Ping a deep LAN decky from the DMZ. With L3 forwarders in between,
|
||||
# this succeeds — the attacker can reach it too.
|
||||
docker exec decnet_t_abcd1234_decky-001 ping -c1 172.20.3.2
|
||||
```
|
||||
|
||||
Logs follow the standard DECNET pipeline. Each decky's service
|
||||
containers write RFC 5424 to stdout; the host's `decnet collect`
|
||||
worker tails `docker logs` and appends to
|
||||
`DECNET_INGEST_LOG_FILE` — no changes to the collector are needed
|
||||
for MazeNET. See [Logging and Syslog](Logging-and-Syslog).
|
||||
|
||||
---
|
||||
|
||||
## Known limitations (v1)
|
||||
|
||||
- **Single-host only.** MazeNET topologies do not span SWARM workers
|
||||
— no overlay networks, no VXLAN. One box, one maze. Cross-host
|
||||
topologies are phase 2.
|
||||
- **No Healer.** `degraded` is schema-reserved but unreachable. A
|
||||
container crashing leaves the topology in `active` until you notice
|
||||
and tear down. Reconciliation worker is phase 2.
|
||||
- **No mutation.** Topologies are static after deploy. You cannot
|
||||
add/remove LANs or rewire bridges without a full teardown +
|
||||
regenerate. The Mutator ([Mutation and Randomization](Mutation-and-Randomization))
|
||||
does not touch MazeNET.
|
||||
- **No per-hop latency shaping.** Bridge deckies forward at wire
|
||||
speed. `tc netem` per hop (to simulate WAN links) is phase 2.
|
||||
- **No web UI yet.** Generate, list, show, deploy, teardown are all
|
||||
CLI. Dashboard integration — including the visual DAG — is on the
|
||||
roadmap (see [Roadmap](Roadmap-and-Known-Debt)).
|
||||
- **IP base cap.** Default `172.20.X.0/24` base prefix caps a
|
||||
topology at 256 LANs (idx > 255 raises). Well above the `depth=16,
|
||||
branching=8` envelope, but don't set `subnet_base_prefix` to
|
||||
something tighter expecting it to still fit.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Likely cause | Fix |
|
||||
|---|---|---|
|
||||
| `topology deploy` immediately raises `TopologyStatusError` | Topology is already `active`/`failed`/`torn_down` — deploy is only legal from `pending` | `decnet topology list` to check the status; run `teardown` first if appropriate |
|
||||
| `teardown` raises `TopologyStatusError` | Already `torn_down`, or tried to tear down from `pending` | `pending` → `torn_down` is legal, but you must be in `pending`; if the row shows `torn_down` there's nothing to do |
|
||||
| Deploy fails with `create_bridge_network` errors about subnet overlap | A previous deploy of the same topology left networks behind, or another topology used the same `172.20.X.0/24` | `docker network ls --filter name=decnet_t_` and remove stragglers by name; teardown is idempotent — run it again |
|
||||
| Bridge decky can't forward packets between LANs | `forwards_l3` rolled false for this bridge | By design. Check `decnet topology show <id>` — non-forwarding bridges are tagged `(bridge)` without the `L3-forward` tag. Regenerate with `--bridge-forward-p 1.0` if you want every bridge forwarding |
|
||||
| Attacker can't reach deep LANs from DMZ | Intermediate bridge is not forwarding, or a LAN in the path is `--internal` with no forwarder in its direct parent | `docker exec <bridge-base> sysctl net.ipv4.ip_forward` should print `1` on every bridge along the path |
|
||||
| Two topologies clash on LAN subnets | Both were generated with the default `subnet_base_prefix=172.20` | Regenerate one with `--seed` changed is not enough — set a different base prefix via INI/config. Subnet base prefix is per-topology and must not overlap with anything else on the box |
|
||||
|
||||
---
|
||||
|
||||
## Where the code lives
|
||||
|
||||
| Module | Role |
|
||||
|---|---|
|
||||
| `decnet/topology/config.py` | `TopologyConfig` Pydantic model + dataclass records for the generator. |
|
||||
| `decnet/topology/generator.py` | Deterministic plan generator. Tree first, then overlay cross-edges. |
|
||||
| `decnet/topology/status.py` | `TopologyStatus` constants + `assert_transition` state machine. |
|
||||
| `decnet/topology/persistence.py` | `persist`, `hydrate`, `transition_status` — repo adapter. |
|
||||
| `decnet/topology/compose.py` | Per-topology compose-file generator. |
|
||||
| `decnet/engine/deployer.py` | `deploy_topology`, `teardown_topology`, `_teardown_order`. |
|
||||
| `decnet/cli/topology.py` | `decnet topology {generate,list,show,deploy,teardown}`. |
|
||||
| `decnet/web/db/models.py` | Five MazeNET SQLModel tables + request DTOs. |
|
||||
| `tests/topology/` | Generator determinism, status machine, persistence roundtrip, compose generation, deploy/failure paths, live docker e2e. |
|
||||
|
||||
Full test coverage is enforced in the repo's 91%+ floor. Run
|
||||
`pytest tests/topology/ -m "not live"` for the fast suite; add
|
||||
`-m live` to exercise the Docker-daemon path (skipped on CI).
|
||||
@@ -18,6 +18,7 @@
|
||||
- [Networking-MACVLAN-IPVLAN](Networking-MACVLAN-IPVLAN)
|
||||
- [Deployment-Modes](Deployment-Modes)
|
||||
- [SWARM-Mode](SWARM-Mode)
|
||||
- [MazeNET](MazeNET)
|
||||
- [Remote-Updates](Remote-Updates)
|
||||
- [Environment-Variables](Environment-Variables)
|
||||
- [Teardown-and-State](Teardown-and-State)
|
||||
|
||||
Reference in New Issue
Block a user