diff --git a/Tailscale-Global-Deployment.md b/Tailscale-Global-Deployment.md new file mode 100644 index 0000000..e151d7b --- /dev/null +++ b/Tailscale-Global-Deployment.md @@ -0,0 +1,455 @@ +# Tailscale Global Deployment + +This page is the playbook for taking DECNET into the wild — cheap VPSes +scattered across regions, real attackers hitting them, telemetry flowing +home — without leaving your management plane on the public internet. + +The rule is simple: + +> **Decoy services bind to the public NIC. Everything else binds to +> Tailscale.** + +Attackers reach `sshd:22`, `smbd:445`, `httpd:80`, etc. on the box's +public address. The dashboard, REST API, SWARM control plane, log +listener, agent, updater, and any management SSH port (2222, 22222, +whatever you use) are not reachable except via your tailnet. Port scans +of the host find decoys and nothing else. + +This is not a replacement for [PKI and mTLS](PKI-and-mTLS) — control-plane +mTLS is still on, always. Tailscale is the **network gate that drops the +packet before TLS even sees it**. Defense in depth: an attacker would +have to first land on your tailnet (WireGuard + your IdP) *and then* +present a CA-signed cert. Either layer alone is sufficient; both +together are the policy. + +See also: [SWARM Mode](SWARM-Mode), [Deployment Modes](Deployment-Modes), +[Networking: MACVLAN and IPVLAN](Networking-MACVLAN-IPVLAN), +[Security and Stealth](Security-and-Stealth). + +--- + +## Why this works + +DECNET already separates **decoy traffic** from **management traffic** at +the worker design level — decoys live on a MACVLAN/IPVLAN bridge with +their own MAC and (often) public IP, while the management daemons +(`decnet api`, `decnet agent`, `decnet listener`, `decnet updater`, +`decnet swarmctl`) listen on the host's primary NIC. + +The default bind for management is `0.0.0.0`, which is fine on a LAN and +catastrophic on a $4/mo VPS. Tailscale gives you a second interface, +`tailscale0`, with a stable `100.x.y.z` address and MagicDNS name +(`..ts.net`). Bind management to that interface and the +public side of the box advertises only the decoys. + +What this does **not** do: + +- It does not anonymize the deckies. Attackers see your VPS's real + public IP. That's the whole point. +- It does not protect against a compromised decky pivoting back to the + host's loopback. That's still on you (container isolation, no shared + volumes, no host-net mode — see + [Networking-MACVLAN-IPVLAN](Networking-MACVLAN-IPVLAN)). +- It does not stop a misconfigured firewall from leaking the management + ports. Tailscale is a network *path*, not a firewall — you still need + the firewall (below). + +--- + +## Topology in one picture + +``` + ┌─── tailnet (100.64.0.0/10) ─── WireGuard mesh ───┐ + │ │ + ┌── you ──┐ ┌─ master VPS ─┐ ┌─ worker VPS (DE) ─┐ ┌─ worker VPS (SG) ─┐ + │ laptop │ │ tailscale0 │ │ tailscale0 │ │ tailscale0 │ + │ tailnet │◀──▶│ 100.64.0.1 │◀──▶│ 100.64.0.2 │ │ 100.64.0.3 │ + └─────────┘ │ │ │ │ │ │ + │ decnet api │ │ decnet agent │ │ decnet agent │ + │ swarmctl │ │ forwarder ───────┼─┐ │ forwarder ─────┐ │ + │ listener │ │ updater │ │ │ updater │ │ + │ updater │ │ │ │ │ │ │ + │ (all bound │ │ (all bound to │ │ │ (all bound to │ │ + │ to ts iface)│ │ ts iface) │ │ │ ts iface) │ │ + └──────┬───────┘ └─────────┬─────────┘ │ └────────┬────────┘ │ + │ eth0 public │ eth0 public│ │ eth0 │ + │ (nothing! optional │ DECKIES │ │ DECKIES │ + │ static brochure) │ on macvlan │ │ macvlan │ + ▼ ▼ │ ▼ │ + │ │ + Internet ◀─── attackers ───▶ Decoys ◀─┴─────────────────────┘ + (public IP, no Tailscale) +``` + +**Two planes, two NICs.** Management is invisible to anyone not on the +tailnet. Decoys are as exposed as they need to be to attract traffic. + +--- + +## Bind-address cheat sheet + +| Daemon | Default bind | What you want | Flag / env | +|---------------------|--------------|---------------|------------------------------------------| +| `decnet api` | `0.0.0.0` | tailscale0 | `--host 100.x.y.z` or `DECNET_API_HOST` | +| `decnet web` | `0.0.0.0` | tailscale0 | `--host 100.x.y.z` | +| `decnet swarmctl` | `127.0.0.1` | tailscale0 | `--host 100.x.y.z` | +| `decnet listener` | `0.0.0.0` | tailscale0 | `--host 100.x.y.z` | +| `decnet agent` | `0.0.0.0` | tailscale0 | `--host 100.x.y.z` | +| `decnet updater` | `0.0.0.0` | tailscale0 | `--host 100.x.y.z` | +| Management SSH | n/a | tailscale0 | `ListenAddress 100.x.y.z` in sshd_config | +| Decoy services | host bridge | public NIC | unchanged — DECNET handles this | + +`100.x.y.z` is the box's own Tailscale-assigned address (`tailscale ip +-4`). You can also bind by interface name on Linux with `ip addr show +tailscale0` if you want it scripted. + +The MagicDNS name (`master..ts.net`) is what you'll point your +browser and CLI at. Don't bake `100.x.y.z` literals into configs — +they're stable but the DNS name is more portable. + +--- + +## Setup walkthrough + +A live-QA-ready single-master, two-worker SWARM deployment on three +VPSes. UNIHOST-on-Tailscale is the same minus steps 5–7. + +### 1. Provision the boxes and install Tailscale + +Pick a provider that **explicitly permits honeypots** in their AUP. +Hetzner, OVH, Vultr are typically OK if you respond to abuse reports +within 24h; some are not. Read the ToS. Getting your account nuked +mid-experiment is a waste of a tailnet. + +Per host: + +```bash +curl -fsSL https://tailscale.com/install.sh | sh +sudo tailscale up --ssh --advertise-tags=tag:decnet-master # on the master +sudo tailscale up --ssh --advertise-tags=tag:decnet-worker # on each worker +tailscale ip -4 # write this down, you'll bind DECNET to it +``` + +`--ssh` lets you reach the box's *management* SSH via Tailscale SSH +without needing public 22 open. That alone gets you halfway. + +`--advertise-tags` lets the ACL (next step) discriminate master from +worker. Tags require pre-authorization in the admin console — set them +up under **Access controls → Tags** before bringing hosts up. + +### 2. Lock down the tailnet ACL + +In the Tailscale admin console, replace the default "everyone can talk +to everyone" ACL with role-based rules: + +```jsonc +{ + "tagOwners": { + "tag:decnet-master": ["autogroup:admin"], + "tag:decnet-worker": ["autogroup:admin"], + "tag:decnet-ops": ["autogroup:admin"] // your laptop / phone + }, + "acls": [ + // Ops machines can reach everything DECNET on its management ports. + { "action": "accept", + "src": ["tag:decnet-ops"], + "dst": ["tag:decnet-master:8000,8770,6514", + "tag:decnet-worker:8765,8766"] }, + + // Master ↔ worker control plane. + { "action": "accept", + "src": ["tag:decnet-master"], + "dst": ["tag:decnet-worker:8765,8766"] }, + + // Worker → master log forwarding (RFC 5425). + { "action": "accept", + "src": ["tag:decnet-worker"], + "dst": ["tag:decnet-master:6514"] }, + + // SSH for ops. + { "action": "accept", + "src": ["tag:decnet-ops"], + "dst": ["tag:decnet-master:22", "tag:decnet-worker:22"] } + ], + "ssh": [ + { "action": "check", + "src": ["autogroup:admin"], + "dst": ["tag:decnet-master", "tag:decnet-worker"], + "users": ["root", "decnet"] } + ] +} +``` + +Workers cannot reach each other's management ports. The master cannot be +reached by random tagged nodes. This is a **deny-by-default** posture. + +### 3. Firewall the public NIC + +Tailscale is a path; the firewall is the wall. On every VPS, drop +inbound to management ports on the public interface — even though +nothing is bound there, a future misconfiguration shouldn't be an +accident waiting to happen. + +```bash +# Replace eth0 with your actual public NIC (`ip route show default`). +sudo nft add table inet decnet +sudo nft add chain inet decnet input '{ type filter hook input priority 0; policy accept; }' + +# Drop management ports on the public NIC, regardless of bind. +sudo nft add rule inet decnet input iif eth0 tcp dport \ + { 8000, 8765, 8766, 8770, 6514, 2222, 22222 } drop + +# Optionally: drop public 22 entirely now that Tailscale SSH carries you. +sudo nft add rule inet decnet input iif eth0 tcp dport 22 drop +``` + +Decoy ports (whatever the deployer assigns — typically 22, 80, 443, 445, +3389, 21, etc. on the *MACVLAN* sub-interface) are unaffected because +they're on a separate logical interface. + +If you're more comfortable in `ufw`: + +```bash +sudo ufw deny in on eth0 to any port 8000,8765,8766,8770,6514,2222,22222 proto tcp +``` + +### 4. Master — start the control plane bound to Tailscale + +```bash +TS_IP=$(tailscale ip -4 | head -n1) + +# API + dashboard. +decnet api --daemon --host "$TS_IP" --port 8000 + +# SWARM control plane. The default is 127.0.0.1; we move it to ts0 so +# you can drive `decnet swarm ...` from your laptop without SSHing in. +decnet swarmctl --daemon --host "$TS_IP" --port 8770 + +# RFC 5425 syslog-over-TLS sink. +decnet listener --daemon \ + --host "$TS_IP" --port 6514 \ + --log-path ~/.decnet/master-logs/master.log \ + --json-path ~/.decnet/master-logs/master.json +``` + +Verify from your laptop: + +```bash +curl -sf http://master..ts.net:8000/api/v1/health +# Expect 401 (the health endpoint is auth-gated). 401 means "reachable +# and rejecting unauthenticated calls" — that's success. +``` + +If you get a connection timeout, the bind is wrong or the firewall is +blocking the tailscale0 chain. `ss -tlnp | grep 8000` on the master +should show `100.x.y.z:8000`, not `0.0.0.0:8000`. + +### 5. Master — enroll workers with their MagicDNS names + +When you enroll a worker, the `--address` is what the master will dial. +Use the MagicDNS name so you don't have to re-enroll if Tailscale +re-assigns the 100.x address. + +```bash +decnet swarm enroll \ + --name worker-de \ + --address worker-de..ts.net \ + --sans worker-de..ts.net,100.64.0.2 \ + --out-dir /tmp/worker-de-bundle +``` + +Include both the DNS name and the current `100.x` address in `--sans` +so cert validation succeeds whichever the master happens to dial. + +### 6. Ship the bundle over Tailscale itself + +```bash +scp -r /tmp/worker-de-bundle/* anti@worker-de..ts.net:~/.decnet/agent/ +``` + +No public SSH involved. The bundle never crosses the open internet. + +### 7. Worker — start agent + forwarder bound to Tailscale + +```bash +TS_IP=$(tailscale ip -4 | head -n1) + +sudo decnet agent --daemon \ + --host "$TS_IP" --port 8765 \ + --agent-dir /home/anti/.decnet/agent + +decnet forwarder --daemon \ + --master-host master..ts.net \ + --master-port 6514 \ + --log-file /var/log/decnet/decnet.log \ + --state-db ~/.decnet/agent/forwarder.db \ + --agent-dir /home/anti/.decnet/agent +``` + +`master..ts.net` resolves over MagicDNS — no `/etc/hosts` +hack, no static IPs. + +### 8. Deploy deckies as usual + +```bash +decnet deploy --mode swarm --deckies 8 --randomize-services +``` + +The deployer doesn't know or care that the control plane is on +Tailscale. Deckies bind to the worker's MACVLAN bridge on the public +NIC; that path is untouched. + +--- + +## Single-host (UNIHOST) variant + +If you're running the whole thing on one VPS and just want your laptop +to reach the dashboard privately: + +```bash +TS_IP=$(tailscale ip -4 | head -n1) +decnet api --daemon --host "$TS_IP" --port 8000 +decnet web --daemon --host "$TS_IP" --port 5173 # if you serve the SPA separately +sudo decnet deploy --mode unihost --deckies 5 --interface eth0 --randomize-services +``` + +Browse to `http://..ts.net:8000/`. Done. The deckies are +on `eth0`'s MACVLAN bridge and reachable to attackers; the dashboard is +not. + +For HTTPS on the dashboard you can use Tailscale's own cert provisioning +(`tailscale cert ..ts.net`) and feed the result to +`decnet api --ssl-keyfile ... --ssl-certfile ...`. MagicDNS issues real +LetsEncrypt certs for tailnet hostnames at no cost. + +--- + +## Live-QA hardening checklist + +You're throwing this at the wild. Before you flip the switch: + +- [ ] **Provider AUP read.** Honeypots explicitly permitted, abuse + contact monitored. +- [ ] **No real secrets on the boxes.** No personal SSH keys, no AWS + credentials, no password reuse. Treat each VPS as compromised + from day zero. +- [ ] **Decky containers are not on host network.** MACVLAN/IPVLAN + only. `--network host` anywhere is a bug, not a shortcut. +- [ ] **Management SSH on Tailscale only.** Public 22 dropped at the + firewall (and ideally `ListenAddress 100.x.y.z` in + `sshd_config`). +- [ ] **All DECNET daemons bound to tailscale0.** `ss -tlnp` confirms + no `0.0.0.0` for management ports. +- [ ] **Firewall drops management ports on public NIC** as belt-and- + suspenders. +- [ ] **mTLS bundles intact** on every worker. `openssl s_client + -connect worker:8765` from a non-tailnet host fails at TCP + (firewalled), and from a tailnet host without a client cert + fails at handshake. +- [ ] **Time sync running.** mTLS will fail with skew >5min. + `chronyc tracking` healthy on every host. +- [ ] **Forwarder buffering tested.** `systemctl stop decnet-listener` + on the master for a minute; logs accumulate locally; on restart + they replay without gaps. (See [SWARM-Mode § Master crash](SWARM-Mode#master-crash--restart).) +- [ ] **Kill switch documented.** One command per box that stops the + decoys and seals the management plane. `decnet teardown --all` + + `tailscale down` on the master is the minimum. +- [ ] **Abuse playbook written.** When the provider emails you about + an SSH brute-force complaint, you need a one-paragraph reply + ready that explains "this is an authorized honeypot, here's the + research contact" — not a panicked decommission. + +--- + +## Operational notes + +### Funnel and Serve are not for this + +Tailscale Funnel exposes a tailnet service to the public internet. **Do +not use it for any DECNET management endpoint**, ever. The whole point +is to keep them off the public internet. Funnel has its place (publicly +reachable canary pages, decoy brochure sites) but not for the dashboard. + +`tailscale serve` (intra-tailnet TLS termination) is fine and can save +you the cert dance for the dashboard: + +```bash +tailscale serve --bg --https=443 http://localhost:8000 +``` + +Now `https://..ts.net/` works in a browser with a real +cert, and the underlying bind can stay on `100.x.y.z:8000`. + +### Cross-region latency + +The forwarder is async and tolerates 200–400ms RTT just fine. The mTLS +handshake adds one round trip on reconnect. Sub-second is not a +realistic target for SG↔EU log delivery and isn't required — RFC 5425 +is offset-tracked and gap-free, not real-time. + +`decnet swarm check` on a high-latency worker can take a couple of +seconds per host. That's the master polling `/health` synchronously over +mTLS over 200ms RTT — expected. + +### What to watch in the dashboard + +For a wild deployment, the high-signal pages: + +- **Attackers** — who's hitting you, source ASN, reputation. +- **Sessions** — full transcripts. Real attackers diverge from script + kiddies fast. +- **Credentials** — what they're trying. (See the credentials view — + this is where DEBT-040 phase 3 RDP captures end up.) +- **Live logs** (SSE) — useful in the first hour of a deploy to confirm + the pipeline is wet. + +### When to break glass + +If a worker's host itself looks compromised (not just a decky — the +host), pull it from the tailnet first and the swarm second: + +```bash +# From your laptop: +tailscale set --auto-update=false # (on the worker, but you may need ssh) +# Faster: yank it from the admin console — "Disable" the device. +# Then on the master: +decnet swarm decommission --name worker-xx --yes +``` + +Disabling at the Tailscale console severs the management plane +instantly without needing the worker to cooperate. + +--- + +## Troubleshooting + +| Symptom | Likely cause | Fix | +|---|---|---| +| `curl http://master..ts.net:8000` from laptop times out | API still bound to `0.0.0.0` but firewall drops it; or bound to `127.0.0.1`; or laptop not on tailnet | `ss -tlnp \| grep 8000` on master; `tailscale status` on laptop | +| Worker forwarder logs `ConnectionRefusedError` to master:6514 | Listener bound to `0.0.0.0` and firewall drops it from tailscale0; or bound to `127.0.0.1`; or ACL blocks worker→master:6514 | Re-bind listener with `--host $TS_IP`; check ACL `dst` includes `tag:decnet-master:6514` | +| `decnet swarm check` says `reachable: false` for one worker | ACL doesn't allow `tag:decnet-master → tag:decnet-worker:8765`; or worker agent bound to `0.0.0.0` while firewall drops public side and Tailscale rule isn't matched | Check ACL; `ss -tlnp \| grep 8765` on worker | +| `ssl.SSLCertVerificationError: Hostname mismatch` from master to worker | `--address` at enrollment didn't include the MagicDNS name in SANs | Re-enroll with `--sans worker..ts.net,100.x.y.z` | +| Tailscale SSH works but `scp` doesn't | `scp` doesn't use the Tailscale SSH server; it falls back to OpenSSH on port 22 — which you firewalled off | Either keep public 22 open *only from Tailscale CGNAT range* (`100.64.0.0/10`), or use `tailscale file cp` | +| Public 22 still being brute-forced even though "firewalled" | The firewall rule is on the wrong interface, or `ufw` ordering put a permissive rule above the deny | `nft list ruleset` and read it top to bottom — don't trust your own config without verifying | +| Decoy traffic also disappears after firewall changes | You dropped on the wrong interface or used `INPUT` policy `DROP` without explicit accepts for the MACVLAN bridge | MACVLAN sub-interfaces have their own naming (e.g. `decnet_macvlan0`) — `iif eth0` should not match them, but verify with `tcpdump` | +| Browser says "your connection is not private" on `.ts.net` | `tailscale serve` not configured, or `tailscale cert` not run for this hostname | `tailscale cert ..ts.net` then point `decnet api --ssl-*` at the resulting files, or use `tailscale serve --https=443` to terminate | +| Master is on Tailscale but workers can't resolve `master..ts.net` | MagicDNS not enabled on the tailnet, or `--accept-dns=false` was used at `tailscale up` | Enable MagicDNS in admin console; restart tailscaled on workers | + +--- + +## Known limitations + +- **No automated bind discovery.** DECNET doesn't auto-detect "use + tailscale0 if present" — you pass `--host` or set the env var + yourself. This is intentional: silently picking an interface based on + what's up at startup is exactly the kind of magic that gets your + management plane on the public internet by accident after a reboot. +- **Tailnet outage = control-plane outage.** If Tailscale's coordination + server is unreachable from a worker, new control-plane connections + fail. Existing WireGuard tunnels stay up (DERP relays handle most + cases), but a cold worker after a tailnet outage won't reach the + master until tailnet recovers. The decoys keep serving attackers and + the forwarder keeps buffering — same story as a master outage. +- **Tailscale free plan caps.** Free tier is generous (100 devices, + 3 users at the time of writing) but has limits. A 50-VPS DECNET + deployment fits comfortably; a 500-VPS one does not. diff --git a/_Sidebar.md b/_Sidebar.md index 0ea55fd..5f6afc0 100644 --- a/_Sidebar.md +++ b/_Sidebar.md @@ -18,6 +18,7 @@ - [Networking-MACVLAN-IPVLAN](Networking-MACVLAN-IPVLAN) - [Deployment-Modes](Deployment-Modes) - [SWARM-Mode](SWARM-Mode) +- [Tailscale-Global-Deployment](Tailscale-Global-Deployment) - [MazeNET](MazeNET) - [Remote-Updates](Remote-Updates) - [Environment-Variables](Environment-Variables)