docs(wiki): add Tailscale global deployment guide

Playbook for taking DECNET into the wild on geographically dispersed
VPSes with management bound to tailscale0 and decoys on the public NIC.
Covers bind cheat sheet, ACL recipe, firewall belt-and-suspenders,
SWARM + UNIHOST walkthroughs, live-QA hardening checklist, and
troubleshooting.
2026-04-26 01:58:22 -04:00
parent 02ab14697f
commit a574fde3a2
2 changed files with 456 additions and 0 deletions

@@ -0,0 +1,455 @@
# Tailscale Global Deployment
This page is the playbook for taking DECNET into the wild — cheap VPSes
scattered across regions, real attackers hitting them, telemetry flowing
home — without leaving your management plane on the public internet.
The rule is simple:
> **Decoy services bind to the public NIC. Everything else binds to
> Tailscale.**
Attackers reach `sshd:22`, `smbd:445`, `httpd:80`, etc. on the box's
public address. The dashboard, REST API, SWARM control plane, log
listener, agent, updater, and any management SSH port (2222, 22222,
whatever you use) are not reachable except via your tailnet. Port scans
of the host find decoys and nothing else.
This is not a replacement for [PKI and mTLS](PKI-and-mTLS) — control-plane
mTLS is still on, always. Tailscale is the **network gate that drops the
packet before TLS even sees it**. Defense in depth: an attacker would
have to first land on your tailnet (WireGuard + your IdP) *and then*
present a CA-signed cert. Either layer alone is sufficient; both
together are the policy.
See also: [SWARM Mode](SWARM-Mode), [Deployment Modes](Deployment-Modes),
[Networking: MACVLAN and IPVLAN](Networking-MACVLAN-IPVLAN),
[Security and Stealth](Security-and-Stealth).
---
## Why this works
DECNET already separates **decoy traffic** from **management traffic** at
the worker design level — decoys live on a MACVLAN/IPVLAN bridge with
their own MAC and (often) public IP, while the management daemons
(`decnet api`, `decnet agent`, `decnet listener`, `decnet updater`,
`decnet swarmctl`) listen on the host's primary NIC.
The default bind for management is `0.0.0.0`, which is fine on a LAN and
catastrophic on a $4/mo VPS. Tailscale gives you a second interface,
`tailscale0`, with a stable `100.x.y.z` address and MagicDNS name
(`<host>.<tailnet>.ts.net`). Bind management to that interface and the
public side of the box advertises only the decoys.
What this does **not** do:
- It does not anonymize the deckies. Attackers see your VPS's real
public IP. That's the whole point.
- It does not protect against a compromised decky pivoting back to the
host's loopback. That's still on you (container isolation, no shared
volumes, no host-net mode — see
[Networking-MACVLAN-IPVLAN](Networking-MACVLAN-IPVLAN)).
- It does not stop a misconfigured firewall from leaking the management
ports. Tailscale is a network *path*, not a firewall — you still need
the firewall (below).
---
## Topology in one picture
```
┌─── tailnet (100.64.0.0/10) ─── WireGuard mesh ───┐
│ │
┌── you ──┐ ┌─ master VPS ─┐ ┌─ worker VPS (DE) ─┐ ┌─ worker VPS (SG) ─┐
│ laptop │ │ tailscale0 │ │ tailscale0 │ │ tailscale0 │
│ tailnet │◀──▶│ 100.64.0.1 │◀──▶│ 100.64.0.2 │ │ 100.64.0.3 │
└─────────┘ │ │ │ │ │ │
│ decnet api │ │ decnet agent │ │ decnet agent │
│ swarmctl │ │ forwarder ───────┼─┐ │ forwarder ─────┐ │
│ listener │ │ updater │ │ │ updater │ │
│ updater │ │ │ │ │ │ │
│ (all bound │ │ (all bound to │ │ │ (all bound to │ │
│ to ts iface)│ │ ts iface) │ │ │ ts iface) │ │
└──────┬───────┘ └─────────┬─────────┘ │ └────────┬────────┘ │
│ eth0 public │ eth0 public│ │ eth0 │
│ (nothing! optional │ DECKIES │ │ DECKIES │
│ static brochure) │ on macvlan │ │ macvlan │
▼ ▼ │ ▼ │
│ │
Internet ◀─── attackers ───▶ Decoys ◀─┴─────────────────────┘
(public IP, no Tailscale)
```
**Two planes, two NICs.** Management is invisible to anyone not on the
tailnet. Decoys are as exposed as they need to be to attract traffic.
---
## Bind-address cheat sheet
| Daemon | Default bind | What you want | Flag / env |
|---------------------|--------------|---------------|------------------------------------------|
| `decnet api` | `0.0.0.0` | tailscale0 | `--host 100.x.y.z` or `DECNET_API_HOST` |
| `decnet web` | `0.0.0.0` | tailscale0 | `--host 100.x.y.z` |
| `decnet swarmctl` | `127.0.0.1` | tailscale0 | `--host 100.x.y.z` |
| `decnet listener` | `0.0.0.0` | tailscale0 | `--host 100.x.y.z` |
| `decnet agent` | `0.0.0.0` | tailscale0 | `--host 100.x.y.z` |
| `decnet updater` | `0.0.0.0` | tailscale0 | `--host 100.x.y.z` |
| Management SSH | n/a | tailscale0 | `ListenAddress 100.x.y.z` in sshd_config |
| Decoy services | host bridge | public NIC | unchanged — DECNET handles this |
`100.x.y.z` is the box's own Tailscale-assigned address (`tailscale ip
-4`). You can also bind by interface name on Linux with `ip addr show
tailscale0` if you want it scripted.
The MagicDNS name (`master.<tailnet>.ts.net`) is what you'll point your
browser and CLI at. Don't bake `100.x.y.z` literals into configs —
they're stable but the DNS name is more portable.
---
## Setup walkthrough
A live-QA-ready single-master, two-worker SWARM deployment on three
VPSes. UNIHOST-on-Tailscale is the same minus steps 57.
### 1. Provision the boxes and install Tailscale
Pick a provider that **explicitly permits honeypots** in their AUP.
Hetzner, OVH, Vultr are typically OK if you respond to abuse reports
within 24h; some are not. Read the ToS. Getting your account nuked
mid-experiment is a waste of a tailnet.
Per host:
```bash
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up --ssh --advertise-tags=tag:decnet-master # on the master
sudo tailscale up --ssh --advertise-tags=tag:decnet-worker # on each worker
tailscale ip -4 # write this down, you'll bind DECNET to it
```
`--ssh` lets you reach the box's *management* SSH via Tailscale SSH
without needing public 22 open. That alone gets you halfway.
`--advertise-tags` lets the ACL (next step) discriminate master from
worker. Tags require pre-authorization in the admin console — set them
up under **Access controls → Tags** before bringing hosts up.
### 2. Lock down the tailnet ACL
In the Tailscale admin console, replace the default "everyone can talk
to everyone" ACL with role-based rules:
```jsonc
{
"tagOwners": {
"tag:decnet-master": ["autogroup:admin"],
"tag:decnet-worker": ["autogroup:admin"],
"tag:decnet-ops": ["autogroup:admin"] // your laptop / phone
},
"acls": [
// Ops machines can reach everything DECNET on its management ports.
{ "action": "accept",
"src": ["tag:decnet-ops"],
"dst": ["tag:decnet-master:8000,8770,6514",
"tag:decnet-worker:8765,8766"] },
// Master ↔ worker control plane.
{ "action": "accept",
"src": ["tag:decnet-master"],
"dst": ["tag:decnet-worker:8765,8766"] },
// Worker → master log forwarding (RFC 5425).
{ "action": "accept",
"src": ["tag:decnet-worker"],
"dst": ["tag:decnet-master:6514"] },
// SSH for ops.
{ "action": "accept",
"src": ["tag:decnet-ops"],
"dst": ["tag:decnet-master:22", "tag:decnet-worker:22"] }
],
"ssh": [
{ "action": "check",
"src": ["autogroup:admin"],
"dst": ["tag:decnet-master", "tag:decnet-worker"],
"users": ["root", "decnet"] }
]
}
```
Workers cannot reach each other's management ports. The master cannot be
reached by random tagged nodes. This is a **deny-by-default** posture.
### 3. Firewall the public NIC
Tailscale is a path; the firewall is the wall. On every VPS, drop
inbound to management ports on the public interface — even though
nothing is bound there, a future misconfiguration shouldn't be an
accident waiting to happen.
```bash
# Replace eth0 with your actual public NIC (`ip route show default`).
sudo nft add table inet decnet
sudo nft add chain inet decnet input '{ type filter hook input priority 0; policy accept; }'
# Drop management ports on the public NIC, regardless of bind.
sudo nft add rule inet decnet input iif eth0 tcp dport \
{ 8000, 8765, 8766, 8770, 6514, 2222, 22222 } drop
# Optionally: drop public 22 entirely now that Tailscale SSH carries you.
sudo nft add rule inet decnet input iif eth0 tcp dport 22 drop
```
Decoy ports (whatever the deployer assigns — typically 22, 80, 443, 445,
3389, 21, etc. on the *MACVLAN* sub-interface) are unaffected because
they're on a separate logical interface.
If you're more comfortable in `ufw`:
```bash
sudo ufw deny in on eth0 to any port 8000,8765,8766,8770,6514,2222,22222 proto tcp
```
### 4. Master — start the control plane bound to Tailscale
```bash
TS_IP=$(tailscale ip -4 | head -n1)
# API + dashboard.
decnet api --daemon --host "$TS_IP" --port 8000
# SWARM control plane. The default is 127.0.0.1; we move it to ts0 so
# you can drive `decnet swarm ...` from your laptop without SSHing in.
decnet swarmctl --daemon --host "$TS_IP" --port 8770
# RFC 5425 syslog-over-TLS sink.
decnet listener --daemon \
--host "$TS_IP" --port 6514 \
--log-path ~/.decnet/master-logs/master.log \
--json-path ~/.decnet/master-logs/master.json
```
Verify from your laptop:
```bash
curl -sf http://master.<tailnet>.ts.net:8000/api/v1/health
# Expect 401 (the health endpoint is auth-gated). 401 means "reachable
# and rejecting unauthenticated calls" — that's success.
```
If you get a connection timeout, the bind is wrong or the firewall is
blocking the tailscale0 chain. `ss -tlnp | grep 8000` on the master
should show `100.x.y.z:8000`, not `0.0.0.0:8000`.
### 5. Master — enroll workers with their MagicDNS names
When you enroll a worker, the `--address` is what the master will dial.
Use the MagicDNS name so you don't have to re-enroll if Tailscale
re-assigns the 100.x address.
```bash
decnet swarm enroll \
--name worker-de \
--address worker-de.<tailnet>.ts.net \
--sans worker-de.<tailnet>.ts.net,100.64.0.2 \
--out-dir /tmp/worker-de-bundle
```
Include both the DNS name and the current `100.x` address in `--sans`
so cert validation succeeds whichever the master happens to dial.
### 6. Ship the bundle over Tailscale itself
```bash
scp -r /tmp/worker-de-bundle/* anti@worker-de.<tailnet>.ts.net:~/.decnet/agent/
```
No public SSH involved. The bundle never crosses the open internet.
### 7. Worker — start agent + forwarder bound to Tailscale
```bash
TS_IP=$(tailscale ip -4 | head -n1)
sudo decnet agent --daemon \
--host "$TS_IP" --port 8765 \
--agent-dir /home/anti/.decnet/agent
decnet forwarder --daemon \
--master-host master.<tailnet>.ts.net \
--master-port 6514 \
--log-file /var/log/decnet/decnet.log \
--state-db ~/.decnet/agent/forwarder.db \
--agent-dir /home/anti/.decnet/agent
```
`master.<tailnet>.ts.net` resolves over MagicDNS — no `/etc/hosts`
hack, no static IPs.
### 8. Deploy deckies as usual
```bash
decnet deploy --mode swarm --deckies 8 --randomize-services
```
The deployer doesn't know or care that the control plane is on
Tailscale. Deckies bind to the worker's MACVLAN bridge on the public
NIC; that path is untouched.
---
## Single-host (UNIHOST) variant
If you're running the whole thing on one VPS and just want your laptop
to reach the dashboard privately:
```bash
TS_IP=$(tailscale ip -4 | head -n1)
decnet api --daemon --host "$TS_IP" --port 8000
decnet web --daemon --host "$TS_IP" --port 5173 # if you serve the SPA separately
sudo decnet deploy --mode unihost --deckies 5 --interface eth0 --randomize-services
```
Browse to `http://<host>.<tailnet>.ts.net:8000/`. Done. The deckies are
on `eth0`'s MACVLAN bridge and reachable to attackers; the dashboard is
not.
For HTTPS on the dashboard you can use Tailscale's own cert provisioning
(`tailscale cert <host>.<tailnet>.ts.net`) and feed the result to
`decnet api --ssl-keyfile ... --ssl-certfile ...`. MagicDNS issues real
LetsEncrypt certs for tailnet hostnames at no cost.
---
## Live-QA hardening checklist
You're throwing this at the wild. Before you flip the switch:
- [ ] **Provider AUP read.** Honeypots explicitly permitted, abuse
contact monitored.
- [ ] **No real secrets on the boxes.** No personal SSH keys, no AWS
credentials, no password reuse. Treat each VPS as compromised
from day zero.
- [ ] **Decky containers are not on host network.** MACVLAN/IPVLAN
only. `--network host` anywhere is a bug, not a shortcut.
- [ ] **Management SSH on Tailscale only.** Public 22 dropped at the
firewall (and ideally `ListenAddress 100.x.y.z` in
`sshd_config`).
- [ ] **All DECNET daemons bound to tailscale0.** `ss -tlnp` confirms
no `0.0.0.0` for management ports.
- [ ] **Firewall drops management ports on public NIC** as belt-and-
suspenders.
- [ ] **mTLS bundles intact** on every worker. `openssl s_client
-connect worker:8765` from a non-tailnet host fails at TCP
(firewalled), and from a tailnet host without a client cert
fails at handshake.
- [ ] **Time sync running.** mTLS will fail with skew >5min.
`chronyc tracking` healthy on every host.
- [ ] **Forwarder buffering tested.** `systemctl stop decnet-listener`
on the master for a minute; logs accumulate locally; on restart
they replay without gaps. (See [SWARM-Mode § Master crash](SWARM-Mode#master-crash--restart).)
- [ ] **Kill switch documented.** One command per box that stops the
decoys and seals the management plane. `decnet teardown --all`
+ `tailscale down` on the master is the minimum.
- [ ] **Abuse playbook written.** When the provider emails you about
an SSH brute-force complaint, you need a one-paragraph reply
ready that explains "this is an authorized honeypot, here's the
research contact" — not a panicked decommission.
---
## Operational notes
### Funnel and Serve are not for this
Tailscale Funnel exposes a tailnet service to the public internet. **Do
not use it for any DECNET management endpoint**, ever. The whole point
is to keep them off the public internet. Funnel has its place (publicly
reachable canary pages, decoy brochure sites) but not for the dashboard.
`tailscale serve` (intra-tailnet TLS termination) is fine and can save
you the cert dance for the dashboard:
```bash
tailscale serve --bg --https=443 http://localhost:8000
```
Now `https://<master>.<tailnet>.ts.net/` works in a browser with a real
cert, and the underlying bind can stay on `100.x.y.z:8000`.
### Cross-region latency
The forwarder is async and tolerates 200400ms RTT just fine. The mTLS
handshake adds one round trip on reconnect. Sub-second is not a
realistic target for SG↔EU log delivery and isn't required — RFC 5425
is offset-tracked and gap-free, not real-time.
`decnet swarm check` on a high-latency worker can take a couple of
seconds per host. That's the master polling `/health` synchronously over
mTLS over 200ms RTT — expected.
### What to watch in the dashboard
For a wild deployment, the high-signal pages:
- **Attackers** — who's hitting you, source ASN, reputation.
- **Sessions** — full transcripts. Real attackers diverge from script
kiddies fast.
- **Credentials** — what they're trying. (See the credentials view —
this is where DEBT-040 phase 3 RDP captures end up.)
- **Live logs** (SSE) — useful in the first hour of a deploy to confirm
the pipeline is wet.
### When to break glass
If a worker's host itself looks compromised (not just a decky — the
host), pull it from the tailnet first and the swarm second:
```bash
# From your laptop:
tailscale set --auto-update=false # (on the worker, but you may need ssh)
# Faster: yank it from the admin console — "Disable" the device.
# Then on the master:
decnet swarm decommission --name worker-xx --yes
```
Disabling at the Tailscale console severs the management plane
instantly without needing the worker to cooperate.
---
## Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| `curl http://master.<tailnet>.ts.net:8000` from laptop times out | API still bound to `0.0.0.0` but firewall drops it; or bound to `127.0.0.1`; or laptop not on tailnet | `ss -tlnp \| grep 8000` on master; `tailscale status` on laptop |
| Worker forwarder logs `ConnectionRefusedError` to master:6514 | Listener bound to `0.0.0.0` and firewall drops it from tailscale0; or bound to `127.0.0.1`; or ACL blocks worker→master:6514 | Re-bind listener with `--host $TS_IP`; check ACL `dst` includes `tag:decnet-master:6514` |
| `decnet swarm check` says `reachable: false` for one worker | ACL doesn't allow `tag:decnet-master → tag:decnet-worker:8765`; or worker agent bound to `0.0.0.0` while firewall drops public side and Tailscale rule isn't matched | Check ACL; `ss -tlnp \| grep 8765` on worker |
| `ssl.SSLCertVerificationError: Hostname mismatch` from master to worker | `--address` at enrollment didn't include the MagicDNS name in SANs | Re-enroll with `--sans worker.<tailnet>.ts.net,100.x.y.z` |
| Tailscale SSH works but `scp` doesn't | `scp` doesn't use the Tailscale SSH server; it falls back to OpenSSH on port 22 — which you firewalled off | Either keep public 22 open *only from Tailscale CGNAT range* (`100.64.0.0/10`), or use `tailscale file cp` |
| Public 22 still being brute-forced even though "firewalled" | The firewall rule is on the wrong interface, or `ufw` ordering put a permissive rule above the deny | `nft list ruleset` and read it top to bottom — don't trust your own config without verifying |
| Decoy traffic also disappears after firewall changes | You dropped on the wrong interface or used `INPUT` policy `DROP` without explicit accepts for the MACVLAN bridge | MACVLAN sub-interfaces have their own naming (e.g. `decnet_macvlan0`) — `iif eth0` should not match them, but verify with `tcpdump` |
| Browser says "your connection is not private" on `<host>.ts.net` | `tailscale serve` not configured, or `tailscale cert` not run for this hostname | `tailscale cert <host>.<tailnet>.ts.net` then point `decnet api --ssl-*` at the resulting files, or use `tailscale serve --https=443` to terminate |
| Master is on Tailscale but workers can't resolve `master.<tailnet>.ts.net` | MagicDNS not enabled on the tailnet, or `--accept-dns=false` was used at `tailscale up` | Enable MagicDNS in admin console; restart tailscaled on workers |
---
## Known limitations
- **No automated bind discovery.** DECNET doesn't auto-detect "use
tailscale0 if present" — you pass `--host` or set the env var
yourself. This is intentional: silently picking an interface based on
what's up at startup is exactly the kind of magic that gets your
management plane on the public internet by accident after a reboot.
- **Tailnet outage = control-plane outage.** If Tailscale's coordination
server is unreachable from a worker, new control-plane connections
fail. Existing WireGuard tunnels stay up (DERP relays handle most
cases), but a cold worker after a tailnet outage won't reach the
master until tailnet recovers. The decoys keep serving attackers and
the forwarder keeps buffering — same story as a master outage.
- **Tailscale free plan caps.** Free tier is generous (100 devices,
3 users at the time of writing) but has limits. A 50-VPS DECNET
deployment fits comfortably; a 500-VPS one does not.

@@ -18,6 +18,7 @@
- [Networking-MACVLAN-IPVLAN](Networking-MACVLAN-IPVLAN)
- [Deployment-Modes](Deployment-Modes)
- [SWARM-Mode](SWARM-Mode)
- [Tailscale-Global-Deployment](Tailscale-Global-Deployment)
- [MazeNET](MazeNET)
- [Remote-Updates](Remote-Updates)
- [Environment-Variables](Environment-Variables)