diff --git a/docs/control-server-guide.md b/docs/control-server-guide.md new file mode 100644 index 0000000..55c50da --- /dev/null +++ b/docs/control-server-guide.md @@ -0,0 +1,560 @@ +# Control Server Operations Guide + +**Host:** control (CT 127) +**IP:** 192.168.1.127 +**Location:** pve2 +**User:** maddox +**Last Updated:** January 23, 2026 + +--- + +## Overview + +The control server is the centralized command center for managing the Proxmox cluster infrastructure. It provides: + +- **Passwordless SSH** to all 13 managed hosts +- **Ansible automation** for cluster-wide operations +- **tmux sessions** for multi-host management +- **Git-based configuration** synced to Forgejo + +--- + +## Quick Start + +### Launch Interactive Menu +```bash +~/scripts/control-menu.sh +``` + +### Launch Multi-Host SSH Session +```bash +~/scripts/ssh-manager.sh +``` + +### Run Ansible Ad-Hoc Command +```bash +cd ~/clustered-fucks +ansible all -m ping +ansible docker_hosts -m shell -a "docker ps --format 'table {{.Names}}\t{{.Status}}'" +``` + +--- + +## Directory Structure + +``` +/home/maddox/ +├── .ssh/ +│ ├── config # SSH host definitions +│ ├── tmux-hosts.conf # tmux session configuration +│ ├── id_ed25519 # SSH private key +│ └── id_ed25519.pub # SSH public key (add to new hosts) +│ +├── clustered-fucks/ # Git repo (synced to Forgejo) +│ ├── ansible.cfg # Ansible configuration +│ ├── inventory/ +│ │ ├── hosts.yml # Host inventory +│ │ └── group_vars/ +│ │ └── all.yml # Global variables +│ └── playbooks/ +│ ├── check-status.yml +│ ├── docker-prune.yml +│ ├── restart-utils.yml +│ ├── update-all.yml +│ └── deploy-utils.yml +│ +└── scripts/ + ├── ssh-manager.sh # tmux multi-host launcher + ├── control-menu.sh # Interactive Ansible menu + └── add-host.sh # New host onboarding +``` + +--- + +## Managed Hosts + +| Host | IP | User | Port | Type | Group | +|------|-----|------|------|------|-------| +| pve2 | .3 | root | 22 | Proxmox | proxmox_nodes | +| pve-dell | .4 | root | 22 | Proxmox | proxmox_nodes | +| replicant | .80 | maddox | 22 | VM | docker_hosts | +| databases | .81 | root | 22 | VM | docker_hosts | +| immich | .82 | root | 22 | VM | docker_hosts | +| media-transcode | .120 | root | 22 | LXC | docker_hosts | +| network-services | .121 | root | 22 | LXC | docker_hosts | +| download-stack | .122 | root | 22 | LXC | docker_hosts | +| docker666 | .123 | root | 22 | LXC | docker_hosts | +| tailscale-home | .124 | root | 22 | LXC | docker_hosts | +| dns-lxc | .125 | root | 22 | LXC | infrastructure | +| nas | .251 | maddox | 44822 | NAS | legacy | +| alien | .252 | maddox | 22 | Docker | legacy | + +--- + +## Ansible Host Groups + +| Group | Members | Use Case | +|-------|---------|----------| +| `all` | All 13 hosts | Connectivity tests | +| `docker_hosts` | 8 hosts | Docker operations | +| `all_managed` | 11 hosts | System updates | +| `proxmox_nodes` | pve2, pve-dell | Node-level ops | +| `infrastructure` | dns-lxc | Non-Docker infra | +| `legacy` | nas, alien | Manual operations | +| `vms` | replicant, databases, immich | VM-specific | +| `lxcs` | 6 LXC containers | LXC-specific | + +--- + +## Playbooks Reference + +### check-status.yml +Reports disk usage, memory usage, and container counts. + +```bash +ansible-playbook playbooks/check-status.yml +``` + +**Target:** all_managed +**Output:** Per-host status line (Disk=X% Mem=X% Containers=X) + +--- + +### update-all.yml +Runs apt update and upgrade on all Docker hosts. + +```bash +ansible-playbook playbooks/update-all.yml + +# With reboot if required: +ansible-playbook playbooks/update-all.yml -e "reboot=true" +``` + +**Target:** docker_hosts +**Note:** Checks for reboot requirement, notifies but doesn't auto-reboot unless `-e "reboot=true"` + +--- + +### docker-prune.yml +Cleans unused Docker resources (images, networks, build cache). + +```bash +ansible-playbook playbooks/docker-prune.yml +``` + +**Target:** docker_hosts +**Note:** dns-lxc will fail (no Docker) - this is expected + +--- + +### restart-utils.yml +Restarts the utils stack (watchtower, autoheal, docker-proxy) on all hosts. + +```bash +ansible-playbook playbooks/restart-utils.yml +``` + +**Target:** docker_hosts +**Note:** Uses host-specific `docker_appdata` variable for non-standard paths + +--- + +### deploy-utils.yml +Deploys standardized utils stack to a new host. + +```bash +ansible-playbook playbooks/deploy-utils.yml --limit new-host +``` + +**Target:** docker_hosts +**Note:** Creates directory structure and .env file only; compose file must be added separately + +--- + +## Scripts Reference + +### ssh-manager.sh + +Launches a tmux session with SSH connections to all hosts. + +```bash +~/scripts/ssh-manager.sh +``` + +**Features:** +- Window 0: Control (local shell) +- Windows 1-13: Individual host SSH sessions +- Final window: Multi-View (all hosts in split panes) + +**Navigation:** +- `Ctrl+b` then window number to switch +- `Ctrl+b d` to detach (keeps session running) +- `tmux attach -t cluster` to reattach + +--- + +### control-menu.sh + +Interactive menu for common operations. + +```bash +~/scripts/control-menu.sh +``` + +**Menu Options:** +``` +[1] Ping All - Test connectivity +[2] Check Status - Disk/memory/containers +[3] Update All - apt upgrade docker hosts +[4] Docker Prune - Clean unused resources +[5] Restart Utils - Restart utils stack everywhere + +[A] Ad-hoc Command - Run custom command +[I] Inventory - Show host list +[S] SSH Manager - Launch tmux session + +[Q] Quit +``` + +--- + +### add-host.sh + +Wizard for onboarding new hosts. + +```bash +~/scripts/add-host.sh +``` + +**Steps:** +1. Prompts for hostname, IP, user, port, description +2. Tests SSH connectivity +3. Copies SSH key if needed +4. Adds to `~/.ssh/config` +5. Adds to `~/.ssh/tmux-hosts.conf` + +**Note:** Ansible inventory must be edited manually. + +--- + +## Common Operations + +### SSH to a Specific Host +```bash +ssh replicant +ssh databases +ssh nas # Uses port 44822 automatically +``` + +### Run Command on All Docker Hosts +```bash +cd ~/clustered-fucks +ansible docker_hosts -m shell -a "docker ps -q | wc -l" +``` + +### Run Command on Specific Host +```bash +ansible replicant -m shell -a "df -h" +``` + +### Copy File to All Hosts +```bash +ansible docker_hosts -m copy -a "src=/path/to/file dest=/path/to/dest" +``` + +### Check Specific Service +```bash +ansible docker_hosts -m shell -a "docker ps --filter name=watchtower --format '{{.Status}}'" +``` + +### View Ansible Inventory +```bash +ansible-inventory --graph +ansible-inventory --list +``` + +--- + +## Git Workflow + +### Repository Location +- **Local:** `~/clustered-fucks/` +- **Remote:** `ssh://git@192.168.1.81:2222/maddox/clustered-fucks.git` +- **Web:** https://git.3ddbrewery.com/maddox/clustered-fucks + +### Standard Workflow +```bash +cd ~/clustered-fucks + +# Make changes to playbooks/inventory +vim playbooks/new-playbook.yml + +# Commit and push +git add -A +git commit -m "Add new playbook" +git push origin main +``` + +### Pull Latest Changes +```bash +cd ~/clustered-fucks +git pull origin main +``` + +--- + +## Adding a New Host + +### 1. Run Onboarding Script +```bash +~/scripts/add-host.sh +``` + +### 2. Edit Ansible Inventory +```bash +vim ~/clustered-fucks/inventory/hosts.yml +``` + +Add under appropriate group: +```yaml + new-host: + ansible_host: 192.168.1.XXX + ansible_user: root +``` + +If non-standard appdata path: +```yaml + new-host: + ansible_host: 192.168.1.XXX + ansible_user: root + docker_appdata: /custom/path/appdata +``` + +### 3. Test Connection +```bash +ansible new-host -m ping +``` + +### 4. Commit Changes +```bash +cd ~/clustered-fucks +git add -A +git commit -m "Add new-host to inventory" +git push origin main +``` + +--- + +## Troubleshooting + +### SSH Connection Refused +```bash +# Check if SSH is running on target +ssh -v hostname + +# If connection refused, access via Proxmox console: +# For LXC: pct enter +# For VM: qm terminal + +# Inside container/VM: +apt install openssh-server +systemctl enable ssh +systemctl start ssh +``` + +### SSH Permission Denied +```bash +# Check key is in authorized_keys on target +ssh-copy-id hostname + +# If still failing, check permissions on target: +# (via Proxmox console) +chmod 700 ~ +chmod 700 ~/.ssh +chmod 600 ~/.ssh/authorized_keys +chown -R root:root ~/.ssh # or appropriate user +``` + +### Ansible "Missing sudo password" +The host is configured with `ansible_become: yes` but no password is set. + +Fix: Either remove `ansible_become: yes` from inventory, or set up passwordless sudo on target: +```bash +echo "username ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/username +``` + +### Playbook Skips Host +Check if host is in the correct group: +```bash +ansible-inventory --graph +``` + +Check host variables: +```bash +ansible-inventory --host hostname +``` + +### Docker Command Not Found +Host is in `docker_hosts` but doesn't have Docker. Move to `infrastructure` group: +```yaml + infrastructure: + hosts: + hostname: + ansible_host: 192.168.1.XXX +``` + +--- + +## Non-Standard Configurations + +### Hosts with Different Appdata Paths + +| Host | Path | +|------|------| +| replicant | `/home/maddox/docker/appdata` | +| docker666 | `/root/docker/appdata` | +| All others | `/home/docker/appdata` | + +These are handled via `docker_appdata` variable in inventory. + +### Hosts with Non-Standard SSH + +| Host | Port | User | +|------|------|------| +| nas | 44822 | maddox | + +Configured in both `~/.ssh/config` and `inventory/hosts.yml`. + +### Hosts Without Utils Stack + +| Host | Reason | +|------|--------| +| tailscale-home | Only runs Headscale, no utils needed | +| dns-lxc | No Docker installed | + +--- + +## Maintenance + +### Update Ansible +```bash +sudo apt update +sudo apt upgrade ansible +``` + +### Regenerate SSH Keys (if compromised) +```bash +# Generate new key +ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519 + +# Distribute to all hosts (will prompt for passwords) +for host in pve2 pve-dell replicant databases immich media-transcode network-services download-stack docker666 tailscale-home dns-lxc alien; do + ssh-copy-id $host +done + +# NAS requires special handling +ssh-copy-id -p 44822 maddox@192.168.1.251 +``` + +### Backup Configuration +```bash +cd ~/clustered-fucks +git add -A +git commit -m "Backup: $(date +%Y-%m-%d)" +git push origin main +``` + +--- + +## Reference Files + +### ~/.ssh/config +``` +Host * + StrictHostKeyChecking accept-new + ServerAliveInterval 60 + ServerAliveCountMax 3 + +Host pve2 + HostName 192.168.1.3 + User root + +Host pve-dell + HostName 192.168.1.4 + User root + +Host replicant + HostName 192.168.1.80 + User maddox + +Host databases + HostName 192.168.1.81 + User root + +Host immich + HostName 192.168.1.82 + User root + +Host media-transcode + HostName 192.168.1.120 + User root + +Host network-services + HostName 192.168.1.121 + User root + +Host download-stack + HostName 192.168.1.122 + User root + +Host docker666 + HostName 192.168.1.123 + User root + +Host tailscale-home + HostName 192.168.1.124 + User root + +Host dns-lxc + HostName 192.168.1.125 + User root + +Host nas + HostName 192.168.1.251 + User maddox + Port 44822 + +Host alien + HostName 192.168.1.252 + User maddox +``` + +### ~/clustered-fucks/ansible.cfg +```ini +[defaults] +inventory = inventory/hosts.yml +remote_user = root +host_key_checking = False +retry_files_enabled = False +gathering = smart +fact_caching = jsonfile +fact_caching_connection = /tmp/ansible_facts +fact_caching_timeout = 86400 +stdout_callback = yaml +forks = 10 + +[privilege_escalation] +become = False + +[ssh_connection] +pipelining = True +ssh_args = -o ControlMaster=auto -o ControlPersist=60s +``` + +--- + +## Changelog + +| Date | Change | +|------|--------| +| 2026-01-23 | Initial deployment, all hosts connected, playbooks tested |