Add control server operations guide
This commit is contained in:
parent
14f6348bf4
commit
4cb3a41f1c
1 changed files with 560 additions and 0 deletions
560
docs/control-server-guide.md
Normal file
560
docs/control-server-guide.md
Normal file
|
|
@ -0,0 +1,560 @@
|
||||||
|
# Control Server Operations Guide
|
||||||
|
|
||||||
|
**Host:** control (CT 127)
|
||||||
|
**IP:** 192.168.1.127
|
||||||
|
**Location:** pve2
|
||||||
|
**User:** maddox
|
||||||
|
**Last Updated:** January 23, 2026
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The control server is the centralized command center for managing the Proxmox cluster infrastructure. It provides:
|
||||||
|
|
||||||
|
- **Passwordless SSH** to all 13 managed hosts
|
||||||
|
- **Ansible automation** for cluster-wide operations
|
||||||
|
- **tmux sessions** for multi-host management
|
||||||
|
- **Git-based configuration** synced to Forgejo
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Launch Interactive Menu
|
||||||
|
```bash
|
||||||
|
~/scripts/control-menu.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Launch Multi-Host SSH Session
|
||||||
|
```bash
|
||||||
|
~/scripts/ssh-manager.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run Ansible Ad-Hoc Command
|
||||||
|
```bash
|
||||||
|
cd ~/clustered-fucks
|
||||||
|
ansible all -m ping
|
||||||
|
ansible docker_hosts -m shell -a "docker ps --format 'table {{.Names}}\t{{.Status}}'"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Directory Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
/home/maddox/
|
||||||
|
├── .ssh/
|
||||||
|
│ ├── config # SSH host definitions
|
||||||
|
│ ├── tmux-hosts.conf # tmux session configuration
|
||||||
|
│ ├── id_ed25519 # SSH private key
|
||||||
|
│ └── id_ed25519.pub # SSH public key (add to new hosts)
|
||||||
|
│
|
||||||
|
├── clustered-fucks/ # Git repo (synced to Forgejo)
|
||||||
|
│ ├── ansible.cfg # Ansible configuration
|
||||||
|
│ ├── inventory/
|
||||||
|
│ │ ├── hosts.yml # Host inventory
|
||||||
|
│ │ └── group_vars/
|
||||||
|
│ │ └── all.yml # Global variables
|
||||||
|
│ └── playbooks/
|
||||||
|
│ ├── check-status.yml
|
||||||
|
│ ├── docker-prune.yml
|
||||||
|
│ ├── restart-utils.yml
|
||||||
|
│ ├── update-all.yml
|
||||||
|
│ └── deploy-utils.yml
|
||||||
|
│
|
||||||
|
└── scripts/
|
||||||
|
├── ssh-manager.sh # tmux multi-host launcher
|
||||||
|
├── control-menu.sh # Interactive Ansible menu
|
||||||
|
└── add-host.sh # New host onboarding
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Managed Hosts
|
||||||
|
|
||||||
|
| Host | IP | User | Port | Type | Group |
|
||||||
|
|------|-----|------|------|------|-------|
|
||||||
|
| pve2 | .3 | root | 22 | Proxmox | proxmox_nodes |
|
||||||
|
| pve-dell | .4 | root | 22 | Proxmox | proxmox_nodes |
|
||||||
|
| replicant | .80 | maddox | 22 | VM | docker_hosts |
|
||||||
|
| databases | .81 | root | 22 | VM | docker_hosts |
|
||||||
|
| immich | .82 | root | 22 | VM | docker_hosts |
|
||||||
|
| media-transcode | .120 | root | 22 | LXC | docker_hosts |
|
||||||
|
| network-services | .121 | root | 22 | LXC | docker_hosts |
|
||||||
|
| download-stack | .122 | root | 22 | LXC | docker_hosts |
|
||||||
|
| docker666 | .123 | root | 22 | LXC | docker_hosts |
|
||||||
|
| tailscale-home | .124 | root | 22 | LXC | docker_hosts |
|
||||||
|
| dns-lxc | .125 | root | 22 | LXC | infrastructure |
|
||||||
|
| nas | .251 | maddox | 44822 | NAS | legacy |
|
||||||
|
| alien | .252 | maddox | 22 | Docker | legacy |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ansible Host Groups
|
||||||
|
|
||||||
|
| Group | Members | Use Case |
|
||||||
|
|-------|---------|----------|
|
||||||
|
| `all` | All 13 hosts | Connectivity tests |
|
||||||
|
| `docker_hosts` | 8 hosts | Docker operations |
|
||||||
|
| `all_managed` | 11 hosts | System updates |
|
||||||
|
| `proxmox_nodes` | pve2, pve-dell | Node-level ops |
|
||||||
|
| `infrastructure` | dns-lxc | Non-Docker infra |
|
||||||
|
| `legacy` | nas, alien | Manual operations |
|
||||||
|
| `vms` | replicant, databases, immich | VM-specific |
|
||||||
|
| `lxcs` | 6 LXC containers | LXC-specific |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Playbooks Reference
|
||||||
|
|
||||||
|
### check-status.yml
|
||||||
|
Reports disk usage, memory usage, and container counts.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook playbooks/check-status.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
**Target:** all_managed
|
||||||
|
**Output:** Per-host status line (Disk=X% Mem=X% Containers=X)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### update-all.yml
|
||||||
|
Runs apt update and upgrade on all Docker hosts.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook playbooks/update-all.yml
|
||||||
|
|
||||||
|
# With reboot if required:
|
||||||
|
ansible-playbook playbooks/update-all.yml -e "reboot=true"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Target:** docker_hosts
|
||||||
|
**Note:** Checks for reboot requirement, notifies but doesn't auto-reboot unless `-e "reboot=true"`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### docker-prune.yml
|
||||||
|
Cleans unused Docker resources (images, networks, build cache).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook playbooks/docker-prune.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
**Target:** docker_hosts
|
||||||
|
**Note:** dns-lxc will fail (no Docker) - this is expected
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### restart-utils.yml
|
||||||
|
Restarts the utils stack (watchtower, autoheal, docker-proxy) on all hosts.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook playbooks/restart-utils.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
**Target:** docker_hosts
|
||||||
|
**Note:** Uses host-specific `docker_appdata` variable for non-standard paths
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### deploy-utils.yml
|
||||||
|
Deploys standardized utils stack to a new host.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook playbooks/deploy-utils.yml --limit new-host
|
||||||
|
```
|
||||||
|
|
||||||
|
**Target:** docker_hosts
|
||||||
|
**Note:** Creates directory structure and .env file only; compose file must be added separately
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scripts Reference
|
||||||
|
|
||||||
|
### ssh-manager.sh
|
||||||
|
|
||||||
|
Launches a tmux session with SSH connections to all hosts.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
~/scripts/ssh-manager.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
- Window 0: Control (local shell)
|
||||||
|
- Windows 1-13: Individual host SSH sessions
|
||||||
|
- Final window: Multi-View (all hosts in split panes)
|
||||||
|
|
||||||
|
**Navigation:**
|
||||||
|
- `Ctrl+b` then window number to switch
|
||||||
|
- `Ctrl+b d` to detach (keeps session running)
|
||||||
|
- `tmux attach -t cluster` to reattach
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### control-menu.sh
|
||||||
|
|
||||||
|
Interactive menu for common operations.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
~/scripts/control-menu.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Menu Options:**
|
||||||
|
```
|
||||||
|
[1] Ping All - Test connectivity
|
||||||
|
[2] Check Status - Disk/memory/containers
|
||||||
|
[3] Update All - apt upgrade docker hosts
|
||||||
|
[4] Docker Prune - Clean unused resources
|
||||||
|
[5] Restart Utils - Restart utils stack everywhere
|
||||||
|
|
||||||
|
[A] Ad-hoc Command - Run custom command
|
||||||
|
[I] Inventory - Show host list
|
||||||
|
[S] SSH Manager - Launch tmux session
|
||||||
|
|
||||||
|
[Q] Quit
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### add-host.sh
|
||||||
|
|
||||||
|
Wizard for onboarding new hosts.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
~/scripts/add-host.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Steps:**
|
||||||
|
1. Prompts for hostname, IP, user, port, description
|
||||||
|
2. Tests SSH connectivity
|
||||||
|
3. Copies SSH key if needed
|
||||||
|
4. Adds to `~/.ssh/config`
|
||||||
|
5. Adds to `~/.ssh/tmux-hosts.conf`
|
||||||
|
|
||||||
|
**Note:** Ansible inventory must be edited manually.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Common Operations
|
||||||
|
|
||||||
|
### SSH to a Specific Host
|
||||||
|
```bash
|
||||||
|
ssh replicant
|
||||||
|
ssh databases
|
||||||
|
ssh nas # Uses port 44822 automatically
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run Command on All Docker Hosts
|
||||||
|
```bash
|
||||||
|
cd ~/clustered-fucks
|
||||||
|
ansible docker_hosts -m shell -a "docker ps -q | wc -l"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run Command on Specific Host
|
||||||
|
```bash
|
||||||
|
ansible replicant -m shell -a "df -h"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Copy File to All Hosts
|
||||||
|
```bash
|
||||||
|
ansible docker_hosts -m copy -a "src=/path/to/file dest=/path/to/dest"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Specific Service
|
||||||
|
```bash
|
||||||
|
ansible docker_hosts -m shell -a "docker ps --filter name=watchtower --format '{{.Status}}'"
|
||||||
|
```
|
||||||
|
|
||||||
|
### View Ansible Inventory
|
||||||
|
```bash
|
||||||
|
ansible-inventory --graph
|
||||||
|
ansible-inventory --list
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Git Workflow
|
||||||
|
|
||||||
|
### Repository Location
|
||||||
|
- **Local:** `~/clustered-fucks/`
|
||||||
|
- **Remote:** `ssh://git@192.168.1.81:2222/maddox/clustered-fucks.git`
|
||||||
|
- **Web:** https://git.3ddbrewery.com/maddox/clustered-fucks
|
||||||
|
|
||||||
|
### Standard Workflow
|
||||||
|
```bash
|
||||||
|
cd ~/clustered-fucks
|
||||||
|
|
||||||
|
# Make changes to playbooks/inventory
|
||||||
|
vim playbooks/new-playbook.yml
|
||||||
|
|
||||||
|
# Commit and push
|
||||||
|
git add -A
|
||||||
|
git commit -m "Add new playbook"
|
||||||
|
git push origin main
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pull Latest Changes
|
||||||
|
```bash
|
||||||
|
cd ~/clustered-fucks
|
||||||
|
git pull origin main
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adding a New Host
|
||||||
|
|
||||||
|
### 1. Run Onboarding Script
|
||||||
|
```bash
|
||||||
|
~/scripts/add-host.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Edit Ansible Inventory
|
||||||
|
```bash
|
||||||
|
vim ~/clustered-fucks/inventory/hosts.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
Add under appropriate group:
|
||||||
|
```yaml
|
||||||
|
new-host:
|
||||||
|
ansible_host: 192.168.1.XXX
|
||||||
|
ansible_user: root
|
||||||
|
```
|
||||||
|
|
||||||
|
If non-standard appdata path:
|
||||||
|
```yaml
|
||||||
|
new-host:
|
||||||
|
ansible_host: 192.168.1.XXX
|
||||||
|
ansible_user: root
|
||||||
|
docker_appdata: /custom/path/appdata
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Test Connection
|
||||||
|
```bash
|
||||||
|
ansible new-host -m ping
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Commit Changes
|
||||||
|
```bash
|
||||||
|
cd ~/clustered-fucks
|
||||||
|
git add -A
|
||||||
|
git commit -m "Add new-host to inventory"
|
||||||
|
git push origin main
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### SSH Connection Refused
|
||||||
|
```bash
|
||||||
|
# Check if SSH is running on target
|
||||||
|
ssh -v hostname
|
||||||
|
|
||||||
|
# If connection refused, access via Proxmox console:
|
||||||
|
# For LXC: pct enter <CT_ID>
|
||||||
|
# For VM: qm terminal <VM_ID>
|
||||||
|
|
||||||
|
# Inside container/VM:
|
||||||
|
apt install openssh-server
|
||||||
|
systemctl enable ssh
|
||||||
|
systemctl start ssh
|
||||||
|
```
|
||||||
|
|
||||||
|
### SSH Permission Denied
|
||||||
|
```bash
|
||||||
|
# Check key is in authorized_keys on target
|
||||||
|
ssh-copy-id hostname
|
||||||
|
|
||||||
|
# If still failing, check permissions on target:
|
||||||
|
# (via Proxmox console)
|
||||||
|
chmod 700 ~
|
||||||
|
chmod 700 ~/.ssh
|
||||||
|
chmod 600 ~/.ssh/authorized_keys
|
||||||
|
chown -R root:root ~/.ssh # or appropriate user
|
||||||
|
```
|
||||||
|
|
||||||
|
### Ansible "Missing sudo password"
|
||||||
|
The host is configured with `ansible_become: yes` but no password is set.
|
||||||
|
|
||||||
|
Fix: Either remove `ansible_become: yes` from inventory, or set up passwordless sudo on target:
|
||||||
|
```bash
|
||||||
|
echo "username ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/username
|
||||||
|
```
|
||||||
|
|
||||||
|
### Playbook Skips Host
|
||||||
|
Check if host is in the correct group:
|
||||||
|
```bash
|
||||||
|
ansible-inventory --graph
|
||||||
|
```
|
||||||
|
|
||||||
|
Check host variables:
|
||||||
|
```bash
|
||||||
|
ansible-inventory --host hostname
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker Command Not Found
|
||||||
|
Host is in `docker_hosts` but doesn't have Docker. Move to `infrastructure` group:
|
||||||
|
```yaml
|
||||||
|
infrastructure:
|
||||||
|
hosts:
|
||||||
|
hostname:
|
||||||
|
ansible_host: 192.168.1.XXX
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Non-Standard Configurations
|
||||||
|
|
||||||
|
### Hosts with Different Appdata Paths
|
||||||
|
|
||||||
|
| Host | Path |
|
||||||
|
|------|------|
|
||||||
|
| replicant | `/home/maddox/docker/appdata` |
|
||||||
|
| docker666 | `/root/docker/appdata` |
|
||||||
|
| All others | `/home/docker/appdata` |
|
||||||
|
|
||||||
|
These are handled via `docker_appdata` variable in inventory.
|
||||||
|
|
||||||
|
### Hosts with Non-Standard SSH
|
||||||
|
|
||||||
|
| Host | Port | User |
|
||||||
|
|------|------|------|
|
||||||
|
| nas | 44822 | maddox |
|
||||||
|
|
||||||
|
Configured in both `~/.ssh/config` and `inventory/hosts.yml`.
|
||||||
|
|
||||||
|
### Hosts Without Utils Stack
|
||||||
|
|
||||||
|
| Host | Reason |
|
||||||
|
|------|--------|
|
||||||
|
| tailscale-home | Only runs Headscale, no utils needed |
|
||||||
|
| dns-lxc | No Docker installed |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Maintenance
|
||||||
|
|
||||||
|
### Update Ansible
|
||||||
|
```bash
|
||||||
|
sudo apt update
|
||||||
|
sudo apt upgrade ansible
|
||||||
|
```
|
||||||
|
|
||||||
|
### Regenerate SSH Keys (if compromised)
|
||||||
|
```bash
|
||||||
|
# Generate new key
|
||||||
|
ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
# Distribute to all hosts (will prompt for passwords)
|
||||||
|
for host in pve2 pve-dell replicant databases immich media-transcode network-services download-stack docker666 tailscale-home dns-lxc alien; do
|
||||||
|
ssh-copy-id $host
|
||||||
|
done
|
||||||
|
|
||||||
|
# NAS requires special handling
|
||||||
|
ssh-copy-id -p 44822 maddox@192.168.1.251
|
||||||
|
```
|
||||||
|
|
||||||
|
### Backup Configuration
|
||||||
|
```bash
|
||||||
|
cd ~/clustered-fucks
|
||||||
|
git add -A
|
||||||
|
git commit -m "Backup: $(date +%Y-%m-%d)"
|
||||||
|
git push origin main
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reference Files
|
||||||
|
|
||||||
|
### ~/.ssh/config
|
||||||
|
```
|
||||||
|
Host *
|
||||||
|
StrictHostKeyChecking accept-new
|
||||||
|
ServerAliveInterval 60
|
||||||
|
ServerAliveCountMax 3
|
||||||
|
|
||||||
|
Host pve2
|
||||||
|
HostName 192.168.1.3
|
||||||
|
User root
|
||||||
|
|
||||||
|
Host pve-dell
|
||||||
|
HostName 192.168.1.4
|
||||||
|
User root
|
||||||
|
|
||||||
|
Host replicant
|
||||||
|
HostName 192.168.1.80
|
||||||
|
User maddox
|
||||||
|
|
||||||
|
Host databases
|
||||||
|
HostName 192.168.1.81
|
||||||
|
User root
|
||||||
|
|
||||||
|
Host immich
|
||||||
|
HostName 192.168.1.82
|
||||||
|
User root
|
||||||
|
|
||||||
|
Host media-transcode
|
||||||
|
HostName 192.168.1.120
|
||||||
|
User root
|
||||||
|
|
||||||
|
Host network-services
|
||||||
|
HostName 192.168.1.121
|
||||||
|
User root
|
||||||
|
|
||||||
|
Host download-stack
|
||||||
|
HostName 192.168.1.122
|
||||||
|
User root
|
||||||
|
|
||||||
|
Host docker666
|
||||||
|
HostName 192.168.1.123
|
||||||
|
User root
|
||||||
|
|
||||||
|
Host tailscale-home
|
||||||
|
HostName 192.168.1.124
|
||||||
|
User root
|
||||||
|
|
||||||
|
Host dns-lxc
|
||||||
|
HostName 192.168.1.125
|
||||||
|
User root
|
||||||
|
|
||||||
|
Host nas
|
||||||
|
HostName 192.168.1.251
|
||||||
|
User maddox
|
||||||
|
Port 44822
|
||||||
|
|
||||||
|
Host alien
|
||||||
|
HostName 192.168.1.252
|
||||||
|
User maddox
|
||||||
|
```
|
||||||
|
|
||||||
|
### ~/clustered-fucks/ansible.cfg
|
||||||
|
```ini
|
||||||
|
[defaults]
|
||||||
|
inventory = inventory/hosts.yml
|
||||||
|
remote_user = root
|
||||||
|
host_key_checking = False
|
||||||
|
retry_files_enabled = False
|
||||||
|
gathering = smart
|
||||||
|
fact_caching = jsonfile
|
||||||
|
fact_caching_connection = /tmp/ansible_facts
|
||||||
|
fact_caching_timeout = 86400
|
||||||
|
stdout_callback = yaml
|
||||||
|
forks = 10
|
||||||
|
|
||||||
|
[privilege_escalation]
|
||||||
|
become = False
|
||||||
|
|
||||||
|
[ssh_connection]
|
||||||
|
pipelining = True
|
||||||
|
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
| Date | Change |
|
||||||
|
|------|--------|
|
||||||
|
| 2026-01-23 | Initial deployment, all hosts connected, playbooks tested |
|
||||||
Loading…
Reference in a new issue