11 KiB
Control Server Operations Guide
Host: control (CT 127)
IP: 192.168.1.127
Location: pve2
User: maddox
Last Updated: January 23, 2026
Overview
The control server is the centralized command center for managing the Proxmox cluster infrastructure. It provides:
- Passwordless SSH to all 13 managed hosts
- Ansible automation for cluster-wide operations
- tmux sessions for multi-host management
- Git-based configuration synced to Forgejo
Quick Start
Launch Interactive Menu
~/scripts/control-menu.sh
Launch Multi-Host SSH Session
~/scripts/ssh-manager.sh
Run Ansible Ad-Hoc Command
cd ~/clustered-fucks
ansible all -m ping
ansible docker_hosts -m shell -a "docker ps --format 'table {{.Names}}\t{{.Status}}'"
Directory Structure
/home/maddox/
├── .ssh/
│ ├── config # SSH host definitions
│ ├── tmux-hosts.conf # tmux session configuration
│ ├── id_ed25519 # SSH private key
│ └── id_ed25519.pub # SSH public key (add to new hosts)
│
├── clustered-fucks/ # Git repo (synced to Forgejo)
│ ├── ansible.cfg # Ansible configuration
│ ├── inventory/
│ │ ├── hosts.yml # Host inventory
│ │ └── group_vars/
│ │ └── all.yml # Global variables
│ └── playbooks/
│ ├── check-status.yml
│ ├── docker-prune.yml
│ ├── restart-utils.yml
│ ├── update-all.yml
│ └── deploy-utils.yml
│
└── scripts/
├── ssh-manager.sh # tmux multi-host launcher
├── control-menu.sh # Interactive Ansible menu
└── add-host.sh # New host onboarding
Managed Hosts
| Host | IP | User | Port | Type | Group |
|---|---|---|---|---|---|
| pve2 | .3 | root | 22 | Proxmox | proxmox_nodes |
| pve-dell | .4 | root | 22 | Proxmox | proxmox_nodes |
| replicant | .80 | maddox | 22 | VM | docker_hosts |
| databases | .81 | root | 22 | VM | docker_hosts |
| immich | .82 | root | 22 | VM | docker_hosts |
| media-transcode | .120 | root | 22 | LXC | docker_hosts |
| network-services | .121 | root | 22 | LXC | docker_hosts |
| download-stack | .122 | root | 22 | LXC | docker_hosts |
| docker666 | .123 | root | 22 | LXC | docker_hosts |
| tailscale-home | .124 | root | 22 | LXC | docker_hosts |
| dns-lxc | .125 | root | 22 | LXC | infrastructure |
| nas | .251 | maddox | 44822 | NAS | legacy |
| alien | .252 | maddox | 22 | Docker | legacy |
Ansible Host Groups
| Group | Members | Use Case |
|---|---|---|
all |
All 13 hosts | Connectivity tests |
docker_hosts |
8 hosts | Docker operations |
all_managed |
11 hosts | System updates |
proxmox_nodes |
pve2, pve-dell | Node-level ops |
infrastructure |
dns-lxc | Non-Docker infra |
legacy |
nas, alien | Manual operations |
vms |
replicant, databases, immich | VM-specific |
lxcs |
6 LXC containers | LXC-specific |
Playbooks Reference
check-status.yml
Reports disk usage, memory usage, and container counts.
ansible-playbook playbooks/check-status.yml
Target: all_managed
Output: Per-host status line (Disk=X% Mem=X% Containers=X)
update-all.yml
Runs apt update and upgrade on all Docker hosts.
ansible-playbook playbooks/update-all.yml
# With reboot if required:
ansible-playbook playbooks/update-all.yml -e "reboot=true"
Target: docker_hosts
Note: Checks for reboot requirement, notifies but doesn't auto-reboot unless -e "reboot=true"
docker-prune.yml
Cleans unused Docker resources (images, networks, build cache).
ansible-playbook playbooks/docker-prune.yml
Target: docker_hosts
Note: dns-lxc will fail (no Docker) - this is expected
restart-utils.yml
Restarts the utils stack (watchtower, autoheal, docker-proxy) on all hosts.
ansible-playbook playbooks/restart-utils.yml
Target: docker_hosts
Note: Uses host-specific docker_appdata variable for non-standard paths
deploy-utils.yml
Deploys standardized utils stack to a new host.
ansible-playbook playbooks/deploy-utils.yml --limit new-host
Target: docker_hosts
Note: Creates directory structure and .env file only; compose file must be added separately
Scripts Reference
ssh-manager.sh
Launches a tmux session with SSH connections to all hosts.
~/scripts/ssh-manager.sh
Features:
- Window 0: Control (local shell)
- Windows 1-13: Individual host SSH sessions
- Final window: Multi-View (all hosts in split panes)
Navigation:
Ctrl+bthen window number to switchCtrl+b dto detach (keeps session running)tmux attach -t clusterto reattach
control-menu.sh
Interactive menu for common operations.
~/scripts/control-menu.sh
Menu Options:
[1] Ping All - Test connectivity
[2] Check Status - Disk/memory/containers
[3] Update All - apt upgrade docker hosts
[4] Docker Prune - Clean unused resources
[5] Restart Utils - Restart utils stack everywhere
[A] Ad-hoc Command - Run custom command
[I] Inventory - Show host list
[S] SSH Manager - Launch tmux session
[Q] Quit
add-host.sh
Wizard for onboarding new hosts.
~/scripts/add-host.sh
Steps:
- Prompts for hostname, IP, user, port, description
- Tests SSH connectivity
- Copies SSH key if needed
- Adds to
~/.ssh/config - Adds to
~/.ssh/tmux-hosts.conf
Note: Ansible inventory must be edited manually.
Common Operations
SSH to a Specific Host
ssh replicant
ssh databases
ssh nas # Uses port 44822 automatically
Run Command on All Docker Hosts
cd ~/clustered-fucks
ansible docker_hosts -m shell -a "docker ps -q | wc -l"
Run Command on Specific Host
ansible replicant -m shell -a "df -h"
Copy File to All Hosts
ansible docker_hosts -m copy -a "src=/path/to/file dest=/path/to/dest"
Check Specific Service
ansible docker_hosts -m shell -a "docker ps --filter name=watchtower --format '{{.Status}}'"
View Ansible Inventory
ansible-inventory --graph
ansible-inventory --list
Git Workflow
Repository Location
- Local:
~/clustered-fucks/ - Remote:
ssh://git@192.168.1.81:2222/maddox/clustered-fucks.git - Web: https://git.3ddbrewery.com/maddox/clustered-fucks
Standard Workflow
cd ~/clustered-fucks
# Make changes to playbooks/inventory
vim playbooks/new-playbook.yml
# Commit and push
git add -A
git commit -m "Add new playbook"
git push origin main
Pull Latest Changes
cd ~/clustered-fucks
git pull origin main
Adding a New Host
1. Run Onboarding Script
~/scripts/add-host.sh
2. Edit Ansible Inventory
vim ~/clustered-fucks/inventory/hosts.yml
Add under appropriate group:
new-host:
ansible_host: 192.168.1.XXX
ansible_user: root
If non-standard appdata path:
new-host:
ansible_host: 192.168.1.XXX
ansible_user: root
docker_appdata: /custom/path/appdata
3. Test Connection
ansible new-host -m ping
4. Commit Changes
cd ~/clustered-fucks
git add -A
git commit -m "Add new-host to inventory"
git push origin main
Troubleshooting
SSH Connection Refused
# Check if SSH is running on target
ssh -v hostname
# If connection refused, access via Proxmox console:
# For LXC: pct enter <CT_ID>
# For VM: qm terminal <VM_ID>
# Inside container/VM:
apt install openssh-server
systemctl enable ssh
systemctl start ssh
SSH Permission Denied
# Check key is in authorized_keys on target
ssh-copy-id hostname
# If still failing, check permissions on target:
# (via Proxmox console)
chmod 700 ~
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
chown -R root:root ~/.ssh # or appropriate user
Ansible "Missing sudo password"
The host is configured with ansible_become: yes but no password is set.
Fix: Either remove ansible_become: yes from inventory, or set up passwordless sudo on target:
echo "username ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/username
Playbook Skips Host
Check if host is in the correct group:
ansible-inventory --graph
Check host variables:
ansible-inventory --host hostname
Docker Command Not Found
Host is in docker_hosts but doesn't have Docker. Move to infrastructure group:
infrastructure:
hosts:
hostname:
ansible_host: 192.168.1.XXX
Non-Standard Configurations
Hosts with Different Appdata Paths
| Host | Path |
|---|---|
| replicant | /home/maddox/docker/appdata |
| docker666 | /root/docker/appdata |
| All others | /home/docker/appdata |
These are handled via docker_appdata variable in inventory.
Hosts with Non-Standard SSH
| Host | Port | User |
|---|---|---|
| nas | 44822 | maddox |
Configured in both ~/.ssh/config and inventory/hosts.yml.
Hosts Without Utils Stack
| Host | Reason |
|---|---|
| tailscale-home | Only runs Headscale, no utils needed |
| dns-lxc | No Docker installed |
Maintenance
Update Ansible
sudo apt update
sudo apt upgrade ansible
Regenerate SSH Keys (if compromised)
# Generate new key
ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519
# Distribute to all hosts (will prompt for passwords)
for host in pve2 pve-dell replicant databases immich media-transcode network-services download-stack docker666 tailscale-home dns-lxc alien; do
ssh-copy-id $host
done
# NAS requires special handling
ssh-copy-id -p 44822 maddox@192.168.1.251
Backup Configuration
cd ~/clustered-fucks
git add -A
git commit -m "Backup: $(date +%Y-%m-%d)"
git push origin main
Reference Files
~/.ssh/config
Host *
StrictHostKeyChecking accept-new
ServerAliveInterval 60
ServerAliveCountMax 3
Host pve2
HostName 192.168.1.3
User root
Host pve-dell
HostName 192.168.1.4
User root
Host replicant
HostName 192.168.1.80
User maddox
Host databases
HostName 192.168.1.81
User root
Host immich
HostName 192.168.1.82
User root
Host media-transcode
HostName 192.168.1.120
User root
Host network-services
HostName 192.168.1.121
User root
Host download-stack
HostName 192.168.1.122
User root
Host docker666
HostName 192.168.1.123
User root
Host tailscale-home
HostName 192.168.1.124
User root
Host dns-lxc
HostName 192.168.1.125
User root
Host nas
HostName 192.168.1.251
User maddox
Port 44822
Host alien
HostName 192.168.1.252
User maddox
~/clustered-fucks/ansible.cfg
[defaults]
inventory = inventory/hosts.yml
remote_user = root
host_key_checking = False
retry_files_enabled = False
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
stdout_callback = yaml
forks = 10
[privilege_escalation]
become = False
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
Changelog
| Date | Change |
|---|---|
| 2026-01-23 | Initial deployment, all hosts connected, playbooks tested |